Data Analysis of PIN Numbers

Data is an interesting topic of exploration. In this example, we are looking at the data as it relates to PIN numbers. PIN numbers are usually all that stands between your bank account and a determined hacker. They can clone your debit card, but they can’t clone your PIN number used to authorize your transactions.  But maybe, if you use a common PIN, they can easily guess your PIN.

In this article by Nick Berry on DataGenetics is an interesting read to better understand the power on data analysis and why data is important when trying to solve a problem. In this case, the problem is trying to solve the question of the most common PIN numbers, and why some PIN combinations are more common than others.

I was able to find almost 3.4 million four digit passwords. Every single one of the of the 10,000 combinations of digits from 0000 through to9999 were represented in the dataset.

The most popular password is  1234 … it’s staggering how popular this password appears to be. Utterly staggering at the lack of imagination … nearly 11% of the 3.4 million passwords are  1234  !!!

The next most popular 4-digit PIN in use is  1111  with over 6% of passwords being this.

In third place is  0000  with almost 2%.

(Statistically, with 10,000 possible combination, if passwords were uniformly randomly distributed, we would expect the these twenty passwords to account for just 0.2% of the total, not the 26.83% encountered)

Looking more closely at the top few records, all the usual suspects are present  1111   2222   3333 …  9999  as well as  1212  and (snigger)  6969 .

It’s not a surprise to see patterns like  1122  and  1313  occurring high up in the list, nor  4321  or 1010 .

 2001  makes an appearance at #19.  1984  follows not far behind in position #26, and James Bond fans may be interested to know  0007  is found between the two of them in position #23 (another variant  0070  follows not much further behind at #28).

The first “puzzling” password I encountered was  2580  in position #22. What is the significance of these digits? Why should so many people select this code to make it appear so high up the list?

Then I realized that  2580 is a straight down the middle of a telephone keypad! (Interestingly, this is very compelling evidence confirming the hypothesis that a 4-digit password list is a great proxy for a PIN number database. If you look at the numeric keypad on a PC-keyboard you’ll see that 2580 is slightly more awkward to type on the PC than a phone because the order of keys on a keyboard is the inverted. Cash machines and other terminals that take credit cards use a phone style numeric pads. It appears that many people have an easy to type/remember PIN number for their credit card and are re-using the same four digits for their online passwords, where the”straight down the middle” mnemonic no longer applies).

One thought on “Data Analysis of PIN Numbers”

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.