Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Crypto::Bcrypt::Password#== #15270

Closed

Conversation

jgaskins
Copy link
Contributor

Use case: my User records (via DB::Serializable) aren't considered == because their password properties are not considered == despite all properties being equal. This leads to confusing spec failures when doing actual_user.should eq expected_user.

Since Crypto::Bcrypt::Password doesn't appear to accrue state, as an alternative to adding == it seems like it could instead be a struct vs a class? This would do the same thing. Both of these ideas pass in spec/std/crypto/bcrypt/password_spec.cr. I went with the def_equals because it's the monkeypatch I use in apps that require passwords (you can't monkeypatch a class to a struct), but I'm also happy to convert it to a struct.

Previously, it used `Reference` equality, which compares identity rather
than value.
@straight-shoota
Copy link
Member

straight-shoota commented Dec 12, 2024

Thanks for this initiative. As always, it's recommended to open an issue first to discuss proposals.

This method existed previously but was removed after the discussion in #7753. I think the main issue is that Crypto::Bcrypt::Password is meant as tool for creating and verifying passwords, not for storage.
Password hashes are supposed to be stored as String or equivalent data types.

So I'd suggest to consider using a different type for password storage in your model as an alternative solution.
That should be much more efficient for equality check because there's no need to deconstruct the hash string. It can be compared verbatim.

We can certainly talk about use cases and reconsider the decision on #7753, but that'll need deep arguments.
We should close this PR for now, as it's not mergeable without repealing a previous decision.

@jgaskins
Copy link
Contributor Author

That issue seems to be about using == to check that a string hashes to the password’s hashed value. Did I overlook something in there about comparing two equivalent Crypto::Bcrypt::Password instances with the same state?

@jgaskins
Copy link
Contributor Author

All I’m looking for is that two objects of the same type with the same state return true with ==.

I think the main issue is that Crypto::Bcrypt::Password is meant as tool for creating and verifying passwords, not for storage. Password hashes are supposed to be stored as String or equivalent data types.

I’m sincerely curious how you think I’m storing it in my DB, because what you’re describing is what I’m doing. I’m storing it in the DB as a string and decoding it into a password when retrieving it. I don’t know of another way to store it, tbh.

@oprypin
Copy link
Member

oprypin commented Dec 12, 2024

Crypto::Bcrypt::Password == String got removed in #7753

but this PR adds
Crypto::Bcrypt::Password == Crypto::Bcrypt::Password

@straight-shoota
Copy link
Member

That's a valid point then 👍

@straight-shoota
Copy link
Member

straight-shoota commented Dec 12, 2024

I’m sincerely curious how you think I’m storing it in my DB, because what you’re describing is what I’m doing. I’m storing it in the DB as a string and decoding it into a password when retrieving it. I don’t know of another way to store it, tbh.

I don't mean for permanent storage. But you must be storing a Password instance in your User model because it seems to affect the equality of that model.
My comment is supposed to dissuade from doing that. Instead, the user's password property should be a String (or equivalent). Then you need to instantiate Bcrypt::Password only temporarily for the password verification process.

@jgaskins
Copy link
Contributor Author

I disagree. That's analogous to holding JSON blobs in their string format and only parsing them temporarily to access their properties, then discarding the parsed data structure.

A password passed through bcrypt is not an opaque string like the cleartext password is. It's converted to a string to send over the wire, such as to a database, but it's a data structure containing 4 scalars. This is the concept that Crypto::Bcrypt::Password represents. It doesn't contain the bcrypt algorithm — it's just the parsed data structure.

@ysbaddaden
Copy link
Contributor

Issue: it becomes possible to compare bcrypt hashes in non constant time.

@straight-shoota
Copy link
Member

straight-shoota commented Dec 13, 2024

I disagree. That's analogous to holding JSON blobs in their string format and only parsing them temporarily to access their properties, then discarding the parsed data structure.

That's not a good analogy. Password hashes are predominantily used for authentication operations, which are typically very infrequent. In most cases when you instantiate a user model, it won't do any password verification.
It's quite wasteful to instantiate Bcrypt::Password if you don't need it. It allocates the instance itself, plus an array and five strings. I presume this could be optimized, which would reduce the effort. But that won't change the fact that it's completely useless unless you actually want to verify the password.

A password passed through bcrypt is not an opaque string like the cleartext password is.

I disagree. For anything else but the process of verifying and creating a password, the hash should be treated as opaque.
Your user model doesn't need to know how the hash works. It just needs to store a blob of data and make that available for the password verification process.

Just because the Bcrypt::Password type exists, should not be meant as motivation to use everywhere you can.
Maybe it was a mistake to expose this in an OOP style and the API should just contain of a .verify method.

@oprypin
Copy link
Member

oprypin commented Dec 13, 2024

I looked into the topic of why we have different versions of bcrypt hash digests, and also why we don't do anything about them during verification. This is because all versions are actually supposed to behave the same, just that various implementations of bcrypt chose to bump the version when they fixed a bug in their implementation.
https://stackoverflow.com/a/36225192

  • $2$ original, underspecified version
  • $2a$ = 2 but more strictly specified
  • $2x$ = 2a but indicates that it was created in a buggy version of PHP
  • $2y$ = 2a but indicates that it was created in a fixed version of PHP
  • $2b$ = 2a but indicates that it was created in a fixed version of OpenBSD

@oprypin
Copy link
Member

oprypin commented Dec 13, 2024

With that out of the way, I can explain why it is invalid to blindly compare Crypto::Bcrypt::Password instances.

  • To achieve permissive enough comparison, the version needs to be explicitly ignored for all currently known cases.

    • Correct usage:

      require "crypto/bcrypt/password"
      stored = Crypto::Bcrypt::Password.create("super secret", cost: 10).to_s.sub("$2a$", "$2y$")
      retrieved = Crypto::Bcrypt::Password.new(stored)
      p retrieved.verify("super secret")  # => true
    • Wrong usage:

      require "crypto/bcrypt/password"
      
      class Crypto::Bcrypt::Password
        def_equals_and_hash version, cost, salt, digest
      end
      
      stored = Crypto::Bcrypt::Password.create("super secret", cost: 10).to_s.sub("$2a$", "$2y$")
      retrieved = Crypto::Bcrypt::Password.new(stored)
      p Crypto::Bcrypt::Password.create("super secret", cost: 10) == retrieved  # => false
  • For potential future revised versions of bcrypt, the version must be actually used. You need to have the ability to switch to a new version for all newly created hashes but keep using the old version for verification against hashes that you already have stored.

  • Most importantly and most realistically: to avoid mismatches with the cost!!
    If you ever change the cost, you need to be able to keep using old hashes that you already have and match their cost, not use whatever is the new one.

    • Correct usage:

      require "crypto/bcrypt/password"
      
      cost = 10
      stored = Crypto::Bcrypt::Password.create("super secret", cost: cost).to_s
      retrieved = Crypto::Bcrypt::Password.new(stored)
      
      cost = 12  # Decided to change the cost in the future
      
      p retrieved.verify("super secret")  # => true
    • Wrong usage:

      require "crypto/bcrypt/password"
      
      class Crypto::Bcrypt::Password
        def_equals_and_hash version, cost, salt, digest
      end
      
      cost = 10
      
      stored = Crypto::Bcrypt::Password.create("super secret", cost: cost).to_s
      retrieved = Crypto::Bcrypt::Password.new(stored)
      
      cost = 12  # Decided to change the cost in the future
      
      p Crypto::Bcrypt::Password.create("super secret", cost: cost) == retrieved  # => false

As such, this pull request is invalid.

@oprypin
Copy link
Member

oprypin commented Dec 13, 2024

Further footgun №1

People might already be using the wrong equality comparison via .to_s == .to_s

require "crypto/bcrypt/password"

cost = 10

stored = Crypto::Bcrypt::Password.create("super secret", cost: cost).to_s

cost = 12  # Decided to change the cost in the future

p Crypto::Bcrypt::Password.create("super secret", cost: cost).to_s == stored  # => false

This tells me that we can probably never change the value of DEFAULT_COST, or there will be outages.

The bcrypt module probably should not have had a "default" cost defined, only suggested via examples.

@oprypin
Copy link
Member

oprypin commented Dec 13, 2024

Further footgun №2

def_equals without any arguments creates an equals method that always returns true!!!

@jgaskins clearly must have thought that by default it uses all fields, but actually it uses no fields. So this PR makes all Password objects equal to each other! Maybe def_equals without arguments should fail, or print a warning considering backwards compatibility

@oprypin oprypin closed this Dec 13, 2024
@jgaskins
Copy link
Contributor Author

Issue: it becomes possible to compare bcrypt hashes in non constant time.

@ysbaddaden Because of this PR? Why would anyone be using this method to compare hashes cryptographically?

Maybe it was a mistake to expose this in an OOP style and the API should just contain of a .verify method.

@straight-shoota Maybe, if the rest of the core team holds the same position you do about it. Having the algorithm exposed in the Crypto::Bcrypt module and another object type specifically to represent a password makes your position shaky at best. If you’re this intent on it only being used on strings, the Password type makes no sense at all and the Bcrypt module should just have something like a verify_password method that receives the hash and the cleartext string.

To achieve permissive enough comparison, the version needs to be explicitly ignored for all currently known cases.

@oprypin This makes sense.

The code examples you gave don't, though. They return false, but not for the reason you think. Here is your first "wrong usage" code example, with the only modification being to show the values instead of simply whether they're equal:

require "crypto/bcrypt/password"

class Crypto::Bcrypt::Password
  def_equals_and_hash version, cost, salt, digest
end

stored = Crypto::Bcrypt::Password.create("super secret", cost: 10).to_s.sub("$2a$", "$2y$")
retrieved = Crypto::Bcrypt::Password.new(stored)
puts Crypto::Bcrypt::Password.create("super secret", cost: 10)
# $2a$10$LHorOxEFXwvVsnvWJSPYTeDInFat9GPJeHg6r5EfgaUB48bo63kHi
puts retrieved 
# $2y$10$XVMuriHkS9NYM2EcSBGLD.LV58fY15GNSv/eVwMDesBLeu0PmQsPO

They aren't mismatched solely due to the different version values. They're mismatched because the only component that matches is the cost.

Unlike SHA, bcrypt isn't deterministic. Every time you call Password.create, a new salt is generated and, by extension, a new digest. This is by design and is the reason the salt exists:

require "crypto/bcrypt/password"

alias Password = Crypto::Bcrypt::Password # This is so clunky to type

2.times { puts Password.create("password") }
# $2a$11$KjbncjfAhTe4xOk5qrem/OpZi7NdbZVq8Sm5.YL2wEe1HlmE.2L4C
# $2a$11$OiAG3SyUL95/NHvguErQfeQN1xv/nOyirpCS4n.3VEuJuci300gs.

This tells me that we can probably never change the value of DEFAULT_COST, or there will be outages.

Again, this draws a conclusion based on incorrect information. Here is a modified version of your code example where we don't change cost (all I did was comment out the line):

require "crypto/bcrypt/password"

cost = 10

stored = Crypto::Bcrypt::Password.create("super secret", cost: cost).to_s

# cost = 12  # Decided to change the cost in the future

p Crypto::Bcrypt::Password.create("super secret", cost: cost).to_s == stored  # => false

The output is exactly the same.

@jgaskins clearly must have thought that by default it uses all fields, but actually it uses no fields.

This is precisely what I thought. TIL.

@ysbaddaden
Copy link
Contributor

Maybe it was a mistake to expose this in an OOP style and the API should just contain of a .verify method.

It was modeled after Ruby's bcrypt gem... I agree we don't need an object, a couple methods would be sufficient:

  • Crypto::BCrypt.hash(password, *, salt = nil, cost = DEFAULT_COST) : String
  • Crypto::BCrypt.verify(hash, password) : Bool

It would avoid the issues mentioned above, limit the exposed API, simplify the internal implementation, and we could avoid a bunch of intermediary allocations (especially Bytes).

We should definitely consider moving to such a limited API, then deprecate the Crypto::BCrypt and Crypto::BCrypt::Password classes 👍

@straight-shoota
Copy link
Member

straight-shoota commented Dec 13, 2024

I suppose the only use case for the Password type is that you can use one instance for multiple verify operations. This might save a little bit of overhead, but if the string parsing was implemented efficiently, it would be negligible.
Also I can't think of many scenarios where you'd want to verify multiple passwords against the same hash. Maybe if the password was mistyped and tried again. But this seems very week.

we could avoid a bunch of intermediary allocations (especially Bytes).

They're actually String allocations (from String#split and one is split further into two parts on character index). Dunno why.
This should work perfectly fine with non-allocating subslices of Bytes 🤷

@ysbaddaden
Copy link
Contributor

I can't think of many scenarios where you'd want to verify multiple passwords against the same hash.

Exactly. Maybe a rainbow table or for hacking purposes... in which case there are faster alternatives.

@oprypin
Copy link
Member

oprypin commented Dec 13, 2024

@jgaskins
Ah yes I didn't take into account the fact that creating a bcrypt hash is based on random salt, so the "invalid" examples are uhh... invalid. And my statement about "we can probably never change the value of DEFAULT_COST" / "Further footgun №1" is fully debunked.

But the initial big point is still applicable - it can be misleading to directly compare two bcrypt hashes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants