Skip to content

Commit

Permalink
algorithm change
Browse files Browse the repository at this point in the history
  • Loading branch information
4kimov committed Sep 9, 2023
1 parent db0c000 commit 20efe1f
Show file tree
Hide file tree
Showing 9 changed files with 191 additions and 169 deletions.
9 changes: 9 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,14 @@
# CHANGELOG

**v0.4.0:** **⚠️ BREAKING CHANGE**
- **Breaking change**: IDs change. Algorithm has been fine-tuned for better performance [[Issue #11](https://github.com/sqids/sqids-spec/issues/11)]
- `alphabet` cannot contain multibyte characters
- `minLength` upper limit has increased from alphabet length to `255`
- Max blocklist re-encoding attempts has been capped at the length of the alphabet - 1
- Minimum alphabet length has changed from 5 to 3
- `minValue()` and `maxValue()` functions have been removed
- Max integer encoding value is `PHP_INT_MAX`

**v0.3.1:**
- Bug fix: spec update (PR #7): blocklist filtering in uppercase-only alphabet [[PR #7](https://github.com/sqids/sqids-spec/pull/7)]

Expand Down
18 changes: 9 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,34 +53,34 @@ Simple encode & decode:

```php
$sqids = new Sqids();
$id = $sqids->encode([1, 2, 3]); // "8QRLaD"
$id = $sqids->encode([1, 2, 3]); // "86Rf07"
$numbers = $sqids->decode($id); // [1, 2, 3]
```

> **Note**
> 🚧 Because of the algorithm's design, **multiple IDs can decode back into the same sequence of numbers**. If it's important to your design that IDs are canonical, you have to manually re-encode decoded numbers and check that the generated ID matches.
Randomize IDs by providing a custom alphabet:
Enforce a *minimum* length for IDs:

```php
$sqids = new Sqids('FxnXM1kBN6cuhsAvjW3Co7l2RePyY8DwaU04Tzt9fHQrqSVKdpimLGIJOgb5ZE');
$id = $sqids->encode([1, 2, 3]); // "B5aMa3"
$sqids = new Sqids('', 10);
$id = $sqids->encode([1, 2, 3]); // "86Rf07xd4z"
$numbers = $sqids->decode($id); // [1, 2, 3]
```

Enforce a *minimum* length for IDs:
Randomize IDs by providing a custom alphabet:

```php
$sqids = new Sqids('', 10);
$id = $sqids->encode([1, 2, 3]); // "75JT1cd0dL"
$sqids = new Sqids('FxnXM1kBN6cuhsAvjW3Co7l2RePyY8DwaU04Tzt9fHQrqSVKdpimLGIJOgb5ZE');
$id = $sqids->encode([1, 2, 3]); // "B4aajs"
$numbers = $sqids->decode($id); // [1, 2, 3]
```

Prevent specific words from appearing anywhere in the auto-generated IDs:

```php
$sqids = new Sqids('', 10, ['word1', 'word2']);
$id = $sqids->encode([1, 2, 3]); // "8QRLaD"
$sqids = new Sqids('', 0, ['86Rf07']);
$id = $sqids->encode([1, 2, 3]); // "se8ojk"
$numbers = $sqids->decode($id); // [1, 2, 3]
```

Expand Down
1 change: 1 addition & 0 deletions composer.json
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
"license": "MIT",
"keywords": [
"sqids",
"hashids",
"generate",
"encode",
"decode",
Expand Down
110 changes: 41 additions & 69 deletions src/Sqids.php
Original file line number Diff line number Diff line change
Expand Up @@ -598,21 +598,26 @@ public function __construct(
$alphabet = self::DEFAULT_ALPHABET;
}

if (strlen($alphabet) < 5) {
throw new InvalidArgumentException('Alphabet length must be at least 5');
if (mb_strlen($alphabet) != strlen($alphabet)) {
throw new InvalidArgumentException('Alphabet cannot contain multibyte characters');
}

if (strlen($alphabet) < 3) {
throw new InvalidArgumentException('Alphabet length must be at least 3');
}

if (count(array_unique(str_split($alphabet))) !== strlen($alphabet)) {
throw new InvalidArgumentException('Alphabet must contain unique characters');
}

$minLengthLimit = 255;
if (
!is_int($minLength) ||
$minLength < self::minValue() ||
$minLength > strlen($alphabet)
$minLength < 0 ||
$minLength > $minLengthLimit
) {
throw new InvalidArgumentException(
'Minimum length has to be between ' . self::minValue() . ' and ' . strlen($alphabet)
'Minimum length has to be between 0 and ' . $minLengthLimit
);
}

Expand All @@ -638,8 +643,8 @@ public function __construct(
* Encodes an array of unsigned integers into an ID
*
* These are the cases where encoding might fail:
* - One of the numbers passed is smaller than `minValue()` or greater than `maxValue()`
* - A partition number is incremented so much that it becomes greater than `maxValue()`
* - One of the numbers passed is smaller than 0 or greater than `maxValue()`
* - An n-number of attempts has been made to re-generated the ID, where n is alphabet length + 1
*
* @param array<int> $numbers Non-negative integers to encode into an ID
* @return string Generated ID
Expand All @@ -650,81 +655,64 @@ public function encode(array $numbers): string
return '';
}

$inRangeNumbers = array_filter($numbers, fn ($n) => $n >= self::minValue() && $n <= self::maxValue());
$inRangeNumbers = array_filter($numbers, fn ($n) => $n >= 0 && $n <= self::maxValue());
if (count($inRangeNumbers) != count($numbers)) {
throw new \InvalidArgumentException(
'Encoding supports numbers between ' . self::minValue() . ' and ' . self::maxValue()
throw new InvalidArgumentException(
'Encoding supports numbers between 0 and ' . self::maxValue()
);
}

return $this->encodeNumbers($numbers, false);
return $this->encodeNumbers($numbers);
}

/**
* Internal function that encodes an array of unsigned integers into an ID
*
* @param array<int> $numbers Non-negative integers to encode into an ID
* @param bool $partitioned If true, the first number is always a throwaway number (used either for blocklist or padding)
* @param int $increment An internal number used to modify the `offset` variable in order to re-generate the ID
* @return string Generated ID
*/
protected function encodeNumbers(array $numbers, bool $partitioned = false): string
protected function encodeNumbers(array $numbers, int $increment = 0): string
{
if ($increment > strlen($this->alphabet)) {
throw new InvalidArgumentException('Reached max attempts to re-generate the ID');
}

$offset = count($numbers);
foreach ($numbers as $i => $v) {
$offset += ord($this->alphabet[$v % strlen($this->alphabet)]) + $i;
}
$offset %= strlen($this->alphabet);
$offset = ($offset + $increment) % strlen($this->alphabet);

$alphabet = substr($this->alphabet, $offset) . substr($this->alphabet, 0, $offset);
$prefix = $alphabet[0];
$partition = $alphabet[1];
$alphabet = substr($alphabet, 2);
$alphabet = strrev($alphabet);
$ret = [$prefix];

for ($i = 0; $i != count($numbers); $i++) {
$num = $numbers[$i];

$alphabetWithoutSeparator = substr($alphabet, 0, -1);
$ret[] = $this->toId($num, $alphabetWithoutSeparator);

$ret[] = $this->toId($num, substr($alphabet, 1));
if ($i < count($numbers) - 1) {
$separator = $alphabet[-1];

if ($partitioned && $i == 0) {
$ret[] = $partition;
} else {
$ret[] = $separator;
}

$ret[] = $alphabet[0];
$alphabet = $this->shuffle($alphabet);
}
}

$id = implode('', $ret);

if ($this->minLength > strlen($id)) {
if (!$partitioned) {
array_unshift($numbers, 0);
$id = $this->encodeNumbers($numbers, true);
}
$id .= $alphabet[0];

if ($this->minLength > strlen($id)) {
$id = $id[0] . substr($alphabet, 0, $this->minLength - strlen($id)) . substr($id, 1);
while ($this->minLength - strlen($id) > 0) {
$alphabet = $this->shuffle($alphabet);
$id .= substr($alphabet, 0, min($this->minLength - strlen($id), strlen($alphabet)));
}
}

if ($this->isBlockedId($id)) {
if ($partitioned) {
if ($numbers[0] + 1 > self::maxValue()) {
throw new \RuntimeException('Ran out of range checking against the blocklist');
} else {
$numbers[0] += 1;
}
} else {
array_unshift($numbers, 0);
}

$id = $this->encodeNumbers($numbers, true);
$id = $this->encodeNumbers($numbers, $increment + 1);
}

return $id;
Expand All @@ -735,7 +723,6 @@ protected function encodeNumbers(array $numbers, bool $partitioned = false): str
*
* These are the cases where the return value might be an empty array:
* - Empty ID / empty string
* - Invalid ID passed (reserved character is in the wrong place)
* - Non-alphabet character is found within the ID
*
* @param string $id Encoded ID
Expand All @@ -759,29 +746,19 @@ public function decode(string $id): array
$prefix = $id[0];
$offset = strpos($this->alphabet, $prefix);
$alphabet = substr($this->alphabet, $offset) . substr($this->alphabet, 0, $offset);
$partition = $alphabet[1];
$alphabet = substr($alphabet, 2);
$alphabet = strrev($alphabet);
$id = substr($id, 1);

$partitionIndex = strpos($id, $partition);
if ($partitionIndex > 0 && $partitionIndex < strlen($id) - 1) {
$id = substr($id, $partitionIndex + 1);
$alphabet = $this->shuffle($alphabet);
}

while (strlen($id) > 0) {
$separator = $alphabet[-1];
$separator = $alphabet[0];

$chunks = explode($separator, $id, 2);
if (!empty($chunks)) {
$alphabetWithoutSeparator = substr($alphabet, 0, -1);
for ($i = 0; $i < strlen($chunks[0]); $i++) {
if (strpos($alphabetWithoutSeparator, $chunks[0][$i]) === false) {
return [];
}
if ($chunks[0] == '') {
return $ret;
}
$ret[] = $this->toNumber($chunks[0], $alphabetWithoutSeparator);

$ret[] = $this->toNumber($chunks[0], substr($alphabet, 1));
if (count($chunks) > 1) {
$alphabet = $this->shuffle($alphabet);
}
Expand All @@ -793,16 +770,6 @@ public function decode(string $id): array
return $ret;
}

public static function minValue(): int
{
return 0;
}

public static function maxValue(): int
{
return PHP_INT_MAX;
}

protected function shuffle(string $alphabet): string
{
$chars = str_split($alphabet);
Expand Down Expand Up @@ -864,6 +831,11 @@ protected function isBlockedId(string $id): bool
return false;
}

protected static function maxValue(): int
{
return PHP_INT_MAX;
}

/**
* Get BC Math or GMP extension.
* @throws \RuntimeException
Expand Down
6 changes: 0 additions & 6 deletions src/SqidsInterface.php
Original file line number Diff line number Diff line change
Expand Up @@ -26,10 +26,4 @@ public function encode(array $numbers): string;
* @return array<int>
*/
public function decode(string $id): array;

/** Get the smallest supported integer that's possible to encode. */
public static function minValue(): int;

/** Get the largest supported integer that's possible to encode. */
public static function maxValue(): int;
}
12 changes: 9 additions & 3 deletions tests/SqidsAlphabetTest.php
Original file line number Diff line number Diff line change
Expand Up @@ -22,15 +22,15 @@ public function testSimple()
$sqids = new Sqids('0123456789abcdef');

$numbers = [1, 2, 3];
$id = '4d9fd2';
$id = '489158';

$this->assertSame($id, $sqids->encode($numbers));
$this->assertSame($numbers, $sqids->decode($id));
}

public function testShortAlphabet()
{
$sqids = new Sqids('abcde');
$sqids = new Sqids('abc');

$numbers = [1, 2, 3];
$this->assertSame($numbers, $sqids->decode($sqids->encode($numbers)));
Expand All @@ -44,6 +44,12 @@ public function testLongAlphabet()
$this->assertSame($numbers, $sqids->decode($sqids->encode($numbers)));
}

public function testMultibyteCharacters()
{
$this->expectException(InvalidArgumentException::class);
new Sqids('ë1092');
}

public function testRepeatingAlphabetCharacters()
{
$this->expectException(InvalidArgumentException::class);
Expand All @@ -53,6 +59,6 @@ public function testRepeatingAlphabetCharacters()
public function testTooShortAlphabet()
{
$this->expectException(InvalidArgumentException::class);
new Sqids('abcd');
new Sqids('ab');
}
}
Loading

0 comments on commit 20efe1f

Please sign in to comment.