-
-
Notifications
You must be signed in to change notification settings - Fork 927
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
security flaw in authentication scheme or odd api description #2746
Comments
I transferred this to the rubygems.org repo since it seems like a better fit for the issue, although I'm not fully sure. |
RubyGems.org doesn't support Basic access authentication via HTTP headers. The only way to access the API is using API key.
I'm not sure I do understand this part well, anyway here is implementation how we lookup for API key in database. rubygems.org/app/controllers/api/base_controller.rb Lines 35 to 40 in 762568a
We do store hashed key and we do have index over this column. Line 22 in 762568a
Line 33 in 762568a
|
Does RubyGems.org store the API keys in plaintext as well? If so: shame on them. |
This sounds super-strange to me. That explanation is basically "we don't care but delegate to someone else" when it comes to storing any user data. They may have to reconsider their approach if this has indeed been their statement. |
Yes, strange to me too. However, the API doc is not exactly good in this aspect. Time to check all docs and components for proper definitions and implementation. |
Hi @hadmut , please help us understand why does following quote from simi not answer your question:
We hash the API key before lookup in DB. ie index is on hashed values of the API keys. |
Well, it's somewhat mediocre, much better than nothing, but not yet as good as common practise. But you need to step a little deeper into cryptography in order to understand. In general, the main reason for doing all that hashing stuff, is to protect against the case that the server's credential database falls into atacker's hands. Some sort of read access, e.g. a backup file or a compromised account. Obviously, if the database contains plaintext credentials, you can easily steal them. e.g. in gemstash, which stores the credentials as plaintext, you can simply query them from the database: sqlite3 gemstash.db 'select * from authorizations;' There's another problem with that: If you automatically install such systems with tools like puppet or ansible, you don't always have a source for reliable secret files or a server which is fully protected against read only attacks, e.g. because the installation server is used by a full company. Therefore, credentials not be stored as plaintext unless needed (e.g. cryptographic secret or private keys). So What's the problem? The problem is, it's deterministic. The same credential (=password) always results in the same hash. Which makes it weak. For example, if John and Jane for some reason have the same password, even if you don't know it, you can recognize that it is the same. Even worse: There is lists of most common used passwords, and lists of stolen passwords from broken servers. Once you have these lists, you can easily prepare their hashes. E.g. very common passwords are 'password' and '123456'. Silly, but good to explain the problem here. Once you have a list of such common or ever used passwords, it's no deal to prepare a list of their hashes Digest::SHA256.hexdigest('password') => "5e884898da28047151d0e56f8dc6292773603d0d6aabbdd62a11ef721d1542d8" and thus hack these passwords by a simple reverse lookup. Because on every server for every account on world, the digest for a given password is always the same. Therefore, for those passwords that are vulnerable for brute force or password list attacks, they are not any more secure than plaintext, because once you see the hash, you now the password. It's not a one-way thing. It's not a trapdoor. To avoid that, state of the art is to use a 'salted' hash. This means, that for every stored credential/password a unique random string is created, and the hash is generated over (I do simplify things here for explaining, do not directly implement that) the concatenation, i.e. Digest::SHA256.hexdigest(credential + randomstring). And both the randomstring and the digest are stored together in the database. The randomstring is not really confidential, because it's not important to keep it secret, but to have it randomly and uniquely created (i.e. a fresh one for every single act of storing a hashed credential). Whenever you verify a password for user authentication, you have to fetch the random string from the database and calculate the digest. The consequence is: You cannot build a codebook, a list of common passwords and their hashes. Because even if you choose a common stupid password like 123456, the hash looks different every time. Check the /etc/passwords of your Linux distribution. It works like that. If you set the same password again or for a different account, the entry looks completely different. E.g.: require 'bcrypt' Each single one of these would verify with '123456', but you need to recalculate the hash, you cannot keep it in a database for reverse lookup. So if an attacker gets access to the database, he cannot simply do a revery lookup against his database of the ten millions of most lovely passwords. He cannot easily check whether there is a hash in the database he knows the plaintext for. He needs to try every single password of his list against every single entry in the database. Which is much, much more work. You can improve this if you don't just use a single hash, but a number of nested repeated hash calls Hash(Hash(Hash(Hash(...)))), which would take e.g. a full second to calculate on latest desktop CPUs. Compare hacking a database with 100 entries and a single weak password contained in such a password list: A simple sha digest could be solved by a reverse lookup, so the attacker would know in milliseconds the plaintext password. Now, he is still able to find the weak password, but checking all his 3 millions known passwords against the 100 entries with 1 second calculation time each would cost him 3e61001 = 3e8 seconds of CPU time. Which is about 9.5 years. On the average, after half the time he would be successful, ie. after 4.75 years. Compare this to the milliseconds it would take him for a reverse lookup. So even if there is a weak password, or one that is used for different accounts, the door is not yet open as wide as with such as simple scheme. BTW, this is state of the art of security for about 20, 30 years. I think it was invented for /etc/passwords end of the eighties or early nineties. Now the downside for you: You cannot index these hashes by knowning the plaintext password. Why not? Because that's exactly what this is intended to spoil, because that's an attack. The idea is that even if you know the plaintext password, let it be '123456', you cannot easily find that in the database. Because that's exactly what this is intended to make impossible. You first need to find the hashed string, then you can take the salt from it and try to check, whether the given password matches, i.e. authentication succeeds. You cannot easily look whether there is an entry for a given password, you have to check every single entry whether it matches the password. If you have 1000 entries, it takes you up to 1000 seconds, average 500 seconds of cpu time. That's why you need the username as an index. Imagine you have 50,000 password entries. It's unfeasible to effectively verify my poor password 'submarine' against every single on of these. But once I login as and then check the password against that entry to find it matching. This is, in simplified words, why your hash scheme is weak, even if it uses sha256, and not uses a username. Because if you can use the password itself as an index, it is weak. If you can use the secret as an index, it shines through. Furthermore, your API description is poor. It was misunderstood by the people writing gemstash, who made it even worse, because they misunderstood it to store the credentials in plaintext. Another hint: Since it is difficult to keep your passwords confidential these days, even the good ones, because there's so many sorts of malware, it is a good idea to use some second factor thing, e.g. u2f / fido2. Since these do work differently, you cannot use the credential as an index anyway, and need a username as an index. Some non-cryptographic hint: In general, it is not a good idea to always store credentials in a local app-specific database, because you can't reliably maintain it, e.g. disable accounts of users who left the organisation or who's notebook has been compromised. If you want to use common practices like connecting to an LDAP server (even AD), you also need some index value, i.e. the username. |
You seem to be assuming that we are using user password as API key. This is not the case, our API keys are generated with |
|
Hi,
I had filed a bug against gemstash, because it stores credentials in plaintext in its database instead of hashing them, what's a standard practice:
rubygems/gemstash#288 (comment)
Their response was that the authorization scheme is not a particular property of gemstash, but a standard defined by the gem api:
https://guides.rubygems.org/rubygems-org-api/#api-authorization
which looks as some misunderstandiing of the HTTP authorization header, see
RFC7617
https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Authorization
https://en.wikipedia.org/wiki/Basic_access_authentication
which uses the Authorization-Header like
where the base64 encoded string is a username and a password, joined with a colon. This allows to used the username as an index into a password database and to find the hash belonging to the username, then check it. gemstash just uses a random string and needs to store it in plaintext in order to find it.
I'm not sure, whether this is just a problem of gemstash, or the gem api in common, but even in the former case you should clarify in https://guides.rubygems.org/rubygems-org-api/#api-authorization
Using just a random API KEY makes it hard to store the key hashed in a database, since there is no index then and you need to check all entries.
The text was updated successfully, but these errors were encountered: