A containerised version of the tools required to train/fine tune Tesseract for a new font.
Based on: https://www.youtube.com/watch?v=TpD76k2HYms
- Clone this repo (
git clone https://github.com/artdevgame/tesseract-trainer.git
) - Copy your selected font into the
src/fonts
directory - Configure docker-compose.yml with your preferences (see below)
- Download and install Docker for your OS (https://www.docker.com/products/docker-desktop)
- From the project root directory, run
docker-compose up
- After the process has finished, you will have a
final.traineddata
in thesrc/output
directory. Use this in your Tesseract project
Change the following environment values in docker-compose.yml:
Property | Example | Description |
---|---|---|
TESSTRAIN_FONT | Agency FB Condensed | The name of the font (not the filename) |
TESSTRAIN_LANG | eng | The language of the training data |
TESSTRAIN_MAX_PAGES | 10 | Training text size |
TESSTRAIN_MAX_ITERATIONS | 400 | Number of iterations for the neural network, more will give a better result but may also lead to overfitting (bad) |