|
Luhn FormulaSocial Insurance Number and Credit Card ValidationRich Nistuk10/1/2003Introduction - What is the Luhn Formula?I had a gig where I needed to create a form in which users were asked to enter a Social Insurance Number as a parameter for a search against a database of clients. SINs are large and prone to entry errors, so governments and financial institutions use an algorithm called the Luhn Formula to check the validity of the number when it is entered. In the application I was writing this is done in a form on the client side. The form would block the form from being posted if the SIN was invalid. The Luhn formula, based on ANSI X4.13(also known as the modulus 10 algorithm) is:
Preparation - Using the AlgorithmAs always, when presented with an algorithm that I need to implement in an application, I try an example or two by hand (using a spreadsheet!) to see if I understand it. Of course, I used my own SIN, which I will not do here. Instead, here is a fake SIN that I know is valid: 123-456-782
The sum is 40 which is a number that ends in zero! The number is valid. Exercise: Try your own SIN, or credit card number number. Transpose a couple of adjacent, non-equal, digits; the resulting SIN should be invalid. So the algorithm is pretty simple. The only ugly bits are what to do if doubling the values in step one result in a value larger than nine, easy for humans to do, but a bit of work for computers. The solution is to use the MODULUS and DIV functions, or their equivalents. You want something like (x DIV 10 + x MOD 10 ), where x is the two digit value and DIV is the integer division function and MOD is the modulus function. Lets try the number 12:
That's pretty simple, and it gives us a clue about how to deal with the other slightly nasty bit: How do you determine, programatically that a number ends in 0? Well, just use the 'MOD 10'! Implentation - The first CutSo now we have the tools to write a straight forward SIN/Credit card number validator! We'll do it in JavaScript. Here's a first cut. It follows the algorithm quite closely 1 function LUHN(values) 2 { 3 var temp = 0; 4 var sum = 0; 5 6 // step 1 7 for (i=values.length-2; i>=0; i-=2) { 8 temp=values[i]*2; 9 if(temp>9) 10 temp = Math.round(temp/10 - 0.5) + temp % 10; 11 sum += temp; 12 } 13 14 // step 2 15 for (i=values.length-1; i>=0; i-=2) { 16 temp=values[i]; 17 sum += temp; 18 } 19 20 // step 3 21 return (sum % 10) == 0; 22 } Line 1 defines the JavaScript function signature, it's difficult to tell in JavaScript, but the parameter values must be an array of integers for this to work. In production code a comment would be placed at the top of the function that states this. The values parameter would also be tested to ensure that it was an array and that it contained only integers.
Lines 3 and 4 declare and initialize two temporary variables that will be used later. Lines 7 to 12 perform the first step, iterating back from the end, doubling and summing the alternating digits. Inside of this loop ther is an if test that catches the digits that, when doubled, are greater than 9. Then use the DIV and MOD trick on them. In JavaScript MOD is performed with the '%' operator, and the Math.round(x) method is used to perform the DIV job. Note that the round method will round to the closest integer, we always want the lowest, hence the need to subtract 0.5 from the results of the division. The other digits are simply added to the sum in the second loop, lines 15 to 18. And finally, the luhn function must return a true or a false, so use '%', the MOD operator, to see what the ones digit of the sum is and return true if the digit is zero and false otherwise. The luh functon can be tested with the following HTML code: 1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> 2 <html> 3 <head> 4 <title>Simple LUHN Algorithm Test</title> 5 </head> 6 <body> 7 <script language="javascript"> 8 function luhn(values) 9 { 10 var temp =0; 11 var sum = 0; 12 13 // step 1 14 for (i=values.length-2; i>=0; i-=2) { 15 temp=values[i]*2; 16 if(temp>9) 17 temp = Math.round(temp/10 - 0.5) + temp % 10; 18 sum += temp; 19 } 20 21 // step 2 22 for (i=values.length-1; i>=0; i-=2) { 23 temp=values[i]; 24 sum += temp; 25 } 26 27 // step 3 28 return (sum % 10) == 0; 29 } 30 31 // The test of the luhn function will be performed with 32 // two values; The first, 123-456-782, which we know is 33 // valid and another, 123-456-780 which we know is not 34 // valid. 35 var values = new Array(1,2,3,4,5,6,7,8,2); 36 document.write("123-456-782 is a valid SIN number, "); 37 document.write("LUHN must return true: "); 38 document.write(luhn(values)+" And here's the output:
It works! But we should re-write the code to be a bit more efficient. OptimizationNow that we have the function working, and we have good idea what's going on, let's re-write it to make it smaller and quicker, if possible. 1 function luhn(values) 2 { 3 var i = values.length; 4 var j = 0; 5 var sum = 0; 6 while (i-- > 0) { 7 tmp = values[i]; 8 if(j++ & 1) { 9 if((tmp <<= 1) > 9) { 10 tmp -= 10; 11 sum++; 12 } 13 } 14 sum += tmp; 15 } 16 return (sum % 10) == 0; 17 } Don't panic! It's not that bad... The most obvious change is that the two for loops have been replaced by a single while loop. Looking at the two for loops in the previous example it's pretty obvious that all the work could be done in a single loop. We just need some way of telling the alternating numbers apart. That's done by the first if statement. The if statement in line 8 contains j & 1, just ignore the '++' for now, where j is an index variable that counts each element in the values array. The '&' character is the bitwise AND operator. It works like this:
That is, as the value of j increases the value of j & 1 alternates between 0 and 1. So this is how to differentiate between alternating numbers. At first glance it's tempting to reuse the value i and avoid the the overhead of declaring and incrementing j. Unfortunately, a look at step one of the specification: "Beginning with the second digit from the right...", shows that we need to use a second counter variable because we don't, in general, know how big the value of the number we are checking is. It could be an odd number of digits or an even number of digits, we wouldn't be guaranteed of skipping the first digit from the right. Which brings us to the increment ('++') operator on line 8. This is the bit that allows us to iterate through the values in the array. Note the placement of the operator. Not ++j but j++, this instructs the JavaScript interpretor to increment the variable after evaluating j & 1. So, we get to line 8, cech to see if we are on an element of values that we have to double, and then increment j. If we need to double the value of increment then we move on to line 9. Line 9 introduces an operator that might be new to you, the left shift operator. The 'normal' version of it looks like this: x = a << b
it works by shifting the digits of the binary representation of the first operand to the left by the number of places specified by the second operand. The spaces created to the right are filled with zeros, and any overflowed digits disapear. Here's an example of how it works(Nope that I'm expressing binary numbers with a 'b' on the end, thus '3' is '11b'): x = 3 << 1
x = 11b << 1
x = 110b
x = 6
The effect of this is that 3 got doubled, which is exactly what we wanted. The unary version of this operator is: x <<= a
which is equivalent to x = x << a
So the argument of the if statement is simply saying, double the value of tmp and see if it's bigger than '9'. If the doubled value is bigger than 9, then we simply subtract 10 from tmp and add the 1 to sum. The value of tmp will be added after the if statement block on line 14. Line 14 is a clever bit, we always add the value of tmp to the sum. Sometimes tmp needs some tweaking, but for the most part it's always added to the sum. Finally we get to line 16. Which is the same as the original function. Exercise:Do you believe that the second version of the luhn function behaves the same as the first? Try it out. Replace the old function with the new function in the HTML test script. What Did we Gain? - An AsideIt is said that the first rule of optimization is "Don't". Pretty good advice. The implication of the advice is make sure that what you've got works. Then worry about making it faster/smaller/whatever. That said, we know that both versions of the code work (you did the exercises, right?). We know that the second version is smaller than the first in terms of both size and use of expensive operators (such as MOD, Math.round). What else did we gain? I wrote a quick test jig to give the function HUGE numbers to test. It created an array with 1000 elements each containing a random number from 0 - 9, and then timed how long it took to both of the functions to execute 50 times. The total time divided by 50 was taken to be the average execution time. Then I doubled the size of the array, and ran the test, again. Then doubled, ran... You get the picture. Here's the results:
Of course, this is a bit of a stupid test, nobody has a credit card number greater than ten or so digits. But arrays of that size would verified in less than a millisecond. That said, an average speed increase of 18% is not too shabby. An ExampleWe have a very nice luhn functon. It's fast, it does what we expect, lets use it to validate some numbers!ConclusionThe LUHN formula as expressed by the functions listed in this article can be used to check the validity of SIN numbers or Credit Card Numbers. Of course, they can't be used to check that the numbers are real! Just use it is ther first line of defense against getting garbage data from the user. If you have any comments, or questions about this article please feel free to contact me via email (rich@danzisoft.ca). Thank you for taking the time to read this article, and I hope you get the oppourtunity to read a few more! |
|