Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
proger authored Apr 16, 2023
1 parent dbaf7a2 commit a63c2e6
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# uk4b: Metadata Pretraining Towards Instruction Finetuning
# Metadata Pretraining Towards Instruction Finetuning

We pretrain unidirectional language models on 4B tokens from [UberText 2.0](https://lang.org.ua/en/ubertext/). We enrich document text with weakly structured metadata, such as title, tags, and publication year, enabling metadata-conditioned text generation and text-conditioned metadata prediction at the same time. We pretrain GPT-2 Small, Medium, and Large models on a single GPU, reporting training times, BPC on BrUK, BERTScore, and BLEURT on titles for 1000 News from the Future.

Expand Down

0 comments on commit a63c2e6

Please sign in to comment.