Skip to content

Commit

Permalink
Add Encoding::for()
Browse files Browse the repository at this point in the history
  • Loading branch information
thekid committed Oct 13, 2024
1 parent 34876fa commit 9eaaeda
Show file tree
Hide file tree
Showing 3 changed files with 30 additions and 4 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ This library implements OpenAI APIs.

TikToken
--------
Encoding text to tokens. Download the [cl100k_base](https://openaipublic.blob.core.windows.net/encodings/cl100k_base.tiktoken) and [o200k_base](https://openaipublic.blob.core.windows.net/encodings/o200k_base.tiktoken) vocabularies first!
Encodes text to tokens. Download the [cl100k_base](https://openaipublic.blob.core.windows.net/encodings/cl100k_base.tiktoken) and [o200k_base](https://openaipublic.blob.core.windows.net/encodings/o200k_base.tiktoken) vocabularies first!

```php
use com\openai\{Encoding, TikTokenFilesIn};
Expand Down
30 changes: 28 additions & 2 deletions src/main/php/com/openai/Encoding.class.php
Original file line number Diff line number Diff line change
@@ -1,8 +1,14 @@
<?php namespace com\openai;

use lang\Enum;
use lang\{Enum, IllegalArgumentException};

/** @see https://github.com/openai/tiktoken/blob/main/tiktoken_ext/openai_public.py */
/**
* Encoding enumeration, supporting `r50k_base`, `p50k_base`, `cl100k_base` and
* `o200k_base`.
*
* @see https://github.com/openai/tiktoken/blob/main/tiktoken_ext/openai_public.py
* @test com.openai.unittest.EncodingTest
*/
class Encoding extends Enum {
const ENDOFTEXT = '<|endoftext|>';
const FIM_PREFIX = '<|fim_prefix|>';
Expand Down Expand Up @@ -55,4 +61,24 @@ public function load(Source $source): Encoder {
public static function named(string $name): self {
return parent::valueOf(self::class, $name);
}

/**
* Returns an encoding for a given model
*
* @throws lang.IllegalArgumentException
*/
public static function for(string $model): self {
static $models= [
'/^o1/' => 'o200k_base',
'/^gpt-4o/' => 'o200k_base',
'/^gpt-4/' => 'cl100k_base',
'/^gpt-3.?5/' => 'cl100k_base',
];

foreach ($models as $pattern => $name) {
if (preg_match($pattern, $model)) return self::named($name);
}

throw new IllegalArgumentException('Unknown model "'.$model.'"');
}
}
2 changes: 1 addition & 1 deletion src/main/php/com/openai/TikTokenFilesIn.class.php
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
* @see https://openaipublic.blob.core.windows.net/encodings/p50k_base.tiktoken - Curie, Code
* @see https://openaipublic.blob.core.windows.net/encodings/cl100k_base.tiktoken - GPT 3.5 / 4.0
* @see https://openaipublic.blob.core.windows.net/encodings/o200k_base.tiktoken - o1, Omni
* @test com.openai.unittest.FromTikTokenTest
* @test com.openai.unittest.TikTokenFilesInTest
*/
class TikTokenFilesIn extends Source {
private $folder;
Expand Down

0 comments on commit 9eaaeda

Please sign in to comment.