Performance: Reduce the time complexity of Faker::Crypto
generators
#2938
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation / Background
This Pull Request has been created because the
Faker::Crypto
methods are generating extraordinarily long random strings to run through each hashing algorithm. The performance of each method withinFaker::Crypto
can be improved by up to 80% by balancing the complexity of the random strings for each hash algorithm.Additional Information
The MD5 hash algorithm returns a string of 32 characters within the range of "a - f" and "0 - 9", which is a range of 16 characters. The total number of possible combinations with this setup is 16^32.
The method
Faker::Crypto.md5
computes the MD5 hash of a random string like this:Digging a little deeper we see that
Lorem.characters
returns a default number of 255Alphanumeric
characters:Going further we find out that
Faker::Alphanumeric.alphanumeric
returns random characters within the range of "a - z" and "0 -9", which is a range of 36 characters. The total number of possible combinations with this setup is 36^255, which is mind-bogglingly larger than 16^32 (the total number of possible combinations for the MD5 hash algorithm).Optimization
The total number of random characters passed through the MD5 hashing algorithm within
Faker::Crypto
can be safely reduced from 255 down to a specific number. You can find that number by solving for the minimum value ofx
within:36^x > 16^32
. This allows MD5 collisions to occur before (non-unique) collisions withinFaker::Lorem.characters
.We can find that perfect number for each hash algorithm by running this ruby solver:
Which provides this output:
From this output we can see the point at which
36^x > 16^32
:This shows that
Faker::Lorem.characters(number: 24)
has317830109213583906223287396307975536640
less possible combinations than the MD5 hash algorithm, whileFaker::Lorem.characters(number: 25)
has467998910543825597179764993024768081920
more possible combinations than the MD5 hash algorithm.We can safely reduce the number of random characters to
25
for theFaker::Crypto.md5
algorithm while still returning deterministically unique hashes. Here are the other optimal values for each algorithm:Performance Benchmark
You can see from the benchmark below that after reducing library complexity the
Faker::Crypto
methods are up to 80% faster, while theFaker::Omniauth
methods enjoy performance gains up to 50% depending upon how heavily they rely upon theFaker::Crypto
methods.Checklist
Before submitting the PR make sure the following are checked:
[Fix #issue-number]