Transformer

Overview

transformer is pure Go package to facilitate applying Natural Language Processing (NLP) models train/test and inference in Go.

This package is in active mode of building and there are many changes ahead. Hence you can use it with your complete own risk. The package will be considered as stable when version 1.0 is released.

transformer is heavily inspired by and based on the popular Python HuggingFace Transformers. It's also influenced by Rust version - rust-bert. In fact, all pre-trained models for Rust are compatible to import to this Go transformer package as both rust-bert's dependency Pytorch Rust binding - tch-rs and Go binding gotch are built with similar principles.

transformer is part of an ambitious goal (together with tokenizer and gotch) to bring more AI/deep-learning tools to Gophers so that they can stick to the language they love and good at and build faster software in production.

Dependencies

2 main dependencies are:

tokenizer
gotch

Prerequisites and installation

As this package depends on gotch which is a Pytorch C++ API binding for Go, a pre-compiled Libtorch copy (CPU or GPU) should be installed in your machine. Please see gotch installation instruction for detail.
Install package: go get -u github.com/sugarme/transformer

Basic example

    import (
        "fmt"
        "log"

        "github.com/sugarme/gotch"
        ts "github.com/sugarme/gotch/tensor"
        "github.com/sugarme/tokenizer"

        "github.com/sugarme/transformer/bert"
    )

    func main() {
        var config *bert.BertConfig = new(bert.BertConfig)
        if err := transformer.LoadConfig(config, "bert-base-uncased", nil); err != nil {
            log.Fatal(err)
        }

        var model *bert.BertForMaskedLM = new(bert.BertForMaskedLM)
        if err := transformer.LoadModel(model, "bert-base-uncased", config, nil, gotch.CPU); err != nil {
            log.Fatal(err)
        }

        var tk *bert.Tokenizer = bert.NewTokenizer()
        if err := tk.Load("bert-base-uncased", nil); err != nil{
            log.Fatal(err)
        }

        sentence1 := "Looks like one [MASK] is missing"
        sentence2 := "It was a very nice and [MASK] day"

        var input []tokenizer.EncodeInput
        input = append(input, tokenizer.NewSingleEncodeInput(tokenizer.NewInputSequence(sentence1)))
        input = append(input, tokenizer.NewSingleEncodeInput(tokenizer.NewInputSequence(sentence2)))

        encodings, err := tk.EncodeBatch(input, true)
        if err != nil {
            log.Fatal(err)
        }

        var maxLen int = 0
        for _, en := range encodings {
            if len(en.Ids) > maxLen {
                maxLen = len(en.Ids)
            }
        }

        var tensors []ts.Tensor
        for _, en := range encodings {
            var tokInput []int64 = make([]int64, maxLen)
            for i := 0; i < len(en.Ids); i++ {
                tokInput[i] = int64(en.Ids[i])
            }

            tensors = append(tensors, ts.TensorFrom(tokInput))
        }

        inputTensor := ts.MustStack(tensors, 0).MustTo(device, true)
        var output ts.Tensor
        ts.NoGrad(func() {
            output, _, _ = model.ForwardT(inputTensor, ts.None, ts.None, ts.None, ts.None, ts.None, ts.None, false)
        })
        index1 := output.MustGet(0).MustGet(4).MustArgmax(0, false, false).Int64Values()[0]
        index2 := output.MustGet(1).MustGet(7).MustArgmax(0, false, false).Int64Values()[0]

        got1, ok := tk.IdToToken(int(index1))
        if !ok {
            fmt.Printf("Cannot find a corresponding word for the given id (%v) in vocab.\n", index1)
        }
        got2, ok := tk.IdToToken(int(index2))
        if !ok {
            fmt.Printf("Cannot find a corresponding word for the given id (%v) in vocab.\n", index2)
        }

        fmt.Println(got1)
        fmt.Println(got2)
        
        // Output:
        // person
        // pleasant
    }

Getting Started

See pkg.go.dev for detail APIs

License

transformer is Apache 2.0 licensed.

Acknowledgement

This project has been inspired and used many concepts from Python HuggingFace Transformers and Rust version - rust-bert.
Pre-trained models and configurations are downloaded remotely from HuggingFace.

Name		Name	Last commit message	Last commit date
Latest commit History 116 Commits
bert		bert
example		example
pipeline		pipeline
pretrained		pretrained
roberta		roberta
util		util
.gitignore		.gitignore
.travis.yml		.travis.yml
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
config-example_test.go		config-example_test.go
config.go		config.go
config_test.go		config_test.go
coverage.out		coverage.out
go.mod		go.mod
go.sum		go.sum
modeling.go		modeling.go
modeling_test.go		modeling_test.go
pipeline.go		pipeline.go
tokenizer.go		tokenizer.go
trainer.go		trainer.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Transformer

Overview

Dependencies

Prerequisites and installation

Basic example

Getting Started

License

Acknowledgement

About

Releases

Packages

Languages

License

sugarme/transformer

Folders and files

Latest commit

History

Repository files navigation

Transformer

Overview

Dependencies

Prerequisites and installation

Basic example

Getting Started

License

Acknowledgement

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages