Monday, March 27, 2017

Universal Password Blacklist

One password-rule to rule them all: Universal Password Blacklist

When someone uses a password, it proves that it is not very "unique"; just being used once either proves it was already low-entropy, or makes it lower-entropy (because the user could have it written down, the system that accepted it might store it in the clear, etc). So no one, anywhere, ever, should be allowed to create an account with that password again.
How to you prevent a password from every being used again? Simple, create a public, universal blacklist for passwords. This one stroke, by itself, forces users to invent ever-more-entropy-laden passwords as time passes.
Of course, you need to do it securely. Which is the rest of this blog post.

Part 1:
My first thought (from a few years ago) is that when passwords are replaced (invalidated) in anyone's system, those passwords should be published, totally publicly.

When you change your password, you always have to enter your old password, and the new password. Also, when you delete an account, you usually have to enter your old password to do it. Either way, this transaction includes a "delete this password" component.
Some systems would automatically store this as a local blacklist of some kind, maybe just in the hashed form; but as far as I know, no one has tried to share these blacklists with other companies, and certainly no one publishes them to the world, which is basically what I am proposing.

If you're on board with black-listing these passwords from your system forever, why not publish them? They're no good anymore, so you could even store these now-obsolete password in the clear! (there are actually potential down-sides to storing in the clear, but since I remove this below, I won't go into them)

So send the now-blacklisted password to a giant repository in-the-sky (cloud). Get a bunch of companies to adopt this common system, and you are on your way to creating a Universal Password Blacklist. You could have a bunch of separate services implement this independently, and source extra black-listed values from each other, in any way that each one finds acceptable.

You end up with a distributed database, kindof block-chain style, but where it may be perfectly acceptable that some databases never end up agreeing with each other.

Part 2:
The "blacklist" part would necessitate checking if a new password is pre-existing. And this would always require a server-side check -- because no client-side system is going to download a terabyte of passwords just to create a new account, that'd be ka-razy. So you need a secure way to check if your new password exists in this Universal Password Blacklist. Luckily, we know how to do that.
1) Adopt a "cryptographic hash" that everyone is happy with
2) Run you tentative new password through this hash
3) Call new-service, to check if this hashed-value exists in their database of passwords
If it exists, it's a no-go, you need to input a better password. If it doesn't exist, go ahead and create the account.

Bonus: No Part 1 Necessary!
Alert security-wonks may have realized that this "read" can also function as a "write", So in Part 2, the "read" to check if that hashed value exists, can be implemented as a "write", like a SQL "insert". If the value already exists, you get a "duplicate key error", and you return to the caller that they should not allow the password that translated into this hashed value. If it did not exist, you successfully write it to the Universal Password Blacklist, and return to the client that this password has never been used before.

Side Note:
This hash should be used only for this system. That is, if someone else used exactly the same process for their internal password-storage, then everyone could "brute-force" attack the hashed values, and reverse it to find original passwords. Ideally this is still very difficult, but there is no reason to not add your own salt-and-pepper-hash to your system, that is distinct from the one adopted for the Universal Password Blacklist, and then this problem is non-existent.

Extra Risk & Mitigation
Risk:  The only real reason I can think of to not accept random inputs from all-comers is if you're worried about DOS-style flooding. Imagine someone hates security, so they just flood you with random "passwords", and you have to just soak up all this data and store it permanently, which is a burden that gives no benefit.

Migitation: I think you could force a client to solve a "hard" problem (burn a number of CPU cycles) per use. This should give a sufficient disincentive to fill your system with noise.

I've run this by a few security-conscious friends of mine, and have found nothing to dissuade me that this is a great idea. I would love it if someone could find a problem with it. Or if someone, somewhere, would implement it. Either one would be great. I claim no patent, or any other ridiculous IP-right, on this idea, so please take it and use it!

Wednesday, January 11, 2017

In Defense of Passphrases

Ever since the XKCD comic on Password Strength became popular, I've heard more and more disparaging remarks about how passphrases are worse than more "random" passwords. I don't understand all the hating on passphrases; the basic idea of them, as I see it, is that words are easier for humans to memorize, and create associations between, than random gibberish characters.

Now it's a given that both "gibberish" passwords and long passphrases can both be done poorly -- "correct horse battery staple" is now a terrible password, because it was featured in the comic. But so is "2143658701badcfe" (even if you somehow think that string was random, the fact that it now appears in this blog post makes it a bad choice). But I think these naysayers do not understand the value that passphrases adds -- it is easier (for most people) to remember words than random characters, of the same entropy (if you aren't familiar with "entropy", think "randomness"). But let's try to prove it with some simple examples.

First, how much entropy is enough? That's a complex question, but for our purposes let's just say 80 bits; this is based on this Q&A entry. Whether 80 bits is enough or not doesn't really matter -- if you want 160 bits, just double the lengths of all the values below.

What does 80 bits of entropy look like in English? The English language allegedly has around 1,000,000 words. Now we can't use them all; for one thing, very rare words are hard to remember. So let's pick from the most common 10,000 words. I'm using the 10,000 words at the top of this github page. I made that list by taking another list, and spending just a few minutes cleaning it. But I don't think anyone would object to 10,000 being a reasonable number of words for someone to know, however you come up with the list.

Now each word has a 1 in 10,000 chance of being selected. This doesn't go quite evenly into 80 bits, but 6 words works out to be 82% of the 80 bits. (7 words would be 820000% of 80 bits, so let's stick with 6 words)

I grabbed 6 random numbers from 1 to 10,000, and got:
  • 6225, 1738, 4836, 6378, 7361, 8406.
Looking up those words on my list 10,000 word list gives:
  • objections, shoulders, breathe, comrade, angrily, vs
That's what 80 bits of entropy looks like in English. So how does that compare to more "conventional" randomly generated passwords?
  • Hex: 63485AE5638C1EDCC61E
    • 20 hex digits is exactly 80 bits of entropy.
  • Decimal: 236663118018716201382515
    • 24 digits is ~80% of the entropy of 80 bits; close enough
  • Base64: kiydPJHQh4jL7
    • A lot of systems try to use all the number, upper and lower case letters, and sometimes other characters thrown in. This is pretty awful for humans, both because it takes longer to type in, and it's often hard to tell a 1 from I from l, 0 from O, etc. But including here for comparison. The math works out that 13 characters in base64 is 2^78; that is only 25% of 2^80, it's close enough.
  • Passphrase: objections shoulders breathe comrade angrily vs
So now, you be the judge. You have to memorize one of these 5 choices; if you succeed, you will live a long and safe life, if you fail, your identity will be stolen and you will be miserable. Which one do you choose?

Or maybe you're trying to be "practical"; which one is easier to type in? Well I just timed myself typing in each one, on my normal keyboard, and my times were: 10 seconds, 8 seconds, 5 seconds, 7 seconds. So the terrible base64 was the fastest, presumably only due to the very low character count; but typing the long passphrase was second in speed. But I certainly wouldn't choose the base64 option, especially if you consider what it's like to type that into a phone/tablet; all the numbers and capital letters require multiple taps, it's terrible. Whereas the whole words could be swiped-in, since they are all recognizable, common words. I'm too lazy to try timing myself on a phone right now, but I would speculate that I can swipe-typing 6 words much faster than I can enter any of the other three random sets of characters above.

Now a real scientific test would be to formalize this a bit more, and run real memory tests on humans. But I think I have proved my point.

Just for fun/reference, here's the javascript code I used to help with the above:

var base64 = function(n){return '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz+/'[n];}
var rnd = function(max){return Math.floor(Math.random() * max)};
var rndDigits = function(base, digits){var s = '';
 for (var i = 0; i < digits; i++)
  s += base64(rnd(base));
 return s;}
console.log('Binary: ' + rndDigits(2, 80));
console.log('Decimal: ' + rndDigits(10, 24));
console.log('Hex: ' + rndDigits(16, 20));
console.log('Base64: ' + rndDigits(64, 13));
console.log(rnd(10000) + ', ' + rnd(10000) + ', ' + rnd(10000) + ', ' + rnd(10000) + ', ' + rnd(10000) + ', ' + rnd(10000));
// for marking times, I just pressed enter before and after each combination
document.addEventListener('keydown', function (e) { if (13 == e.keyCode) { console.log(new Date()); } }, false);