Is it possible to fine tune with our own datasets? #413

ninedesu · 2024-11-22T02:01:18Z

ninedesu
Nov 22, 2024

I want to know if we can use our own dataset to finetune the OCR

PeterStaar-IBM · 2024-11-22T05:03:53Z

PeterStaar-IBM
Nov 22, 2024
Maintainer

@ninedesu This is an excellent question, and yes, we plan to build a community where people can contribute data for fine-tuning. At the moment, we are gathering all our internal and external datasets (eg https://huggingface.co/datasets/ds4sd/DocLayNet) and preparing them so we can share them all on the huggingface website!

With regard to OCR, we have a bit of work to do and are right now relying on 3rd party OCR.

1 reply

bit-scientist Dec 13, 2024

@PeterStaar-IBM, is there any update on custom training guidelines?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is it possible to fine tune with our own datasets? #413

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Is it possible to fine tune with our own datasets? #413

ninedesu Nov 22, 2024

Replies: 1 comment · 1 reply

PeterStaar-IBM Nov 22, 2024 Maintainer

bit-scientist Dec 13, 2024

ninedesu
Nov 22, 2024

Replies: 1 comment 1 reply

PeterStaar-IBM
Nov 22, 2024
Maintainer