release stage: "fuzzy-potato" (alpha)
⚠ Marked for Deprecation. please visit https://github.com/brain-score/language for the migrated project ⚠
- Unit Tests: this report lists tests used for the module and their outcomes: success/failure (using pytest)
- Code Coverage: describes what parts of the code are tested down to individual lines (using Coverage.py)
- Static Type Checks: results of static type-checking of the code where type annotations are available (using Mypy)
Provides a library for systematic comparison of encoder representations in the most general sense.
An encoder
is an entity that encode
s linguistic input (e.g., text), and returns a representation of it
(typically in high-dimensional space).
We envision encoders to be either human brains or artificial neural networks (ANNs).
Humans see textual stimuli on a screen which leads to certain computations in the brain,
which can be measured using several proxies, such as fMRI, EEG, ECOG. Similarly, ANNs process textual input
in the form of vectors, and output either some sort of embeddings or latent vectors, all
meant to be useful representations of input for downstream tasks.
In this project, and in this general family of research projects, we want to evaluate the similarity between various ways of generating representations of input stimuli. We are also interested in eventually understanding what kind of representations the brain employs, and how we can go closer to these, and building models helps us travel in that direction.
may be interested in developing better models of brain activation to understand what kind of stimuli drive response in certain parts of the brain. While similar efforts exist in the vision domain, in this project, we target language processing in the brain. We provide ways to use several existing fMRI datasets as benchmarks for computing a language-brainscore. We also provide ways to work with your own data and test ANN models against this data.
researchers may be interested in comparing how similar representations are across various ANN models, particularly models they develop or study. They may be also interested in creating increasingly more cognitively plausible models of natural language understanding. Whereas language-brainscore is not a direct measure of cognitive plausibility of ANN models, it provides a possible direction to optimize towards.
(make sure to install the package first: jump to the install section of this README)
This project has examples hosted on binder. Simply click on the binder launch button to view a Jupyter notebook
with example usage.
Alternatively, take a peek at the examples/
directory for scripts as well as notebooks.
Following is a schematic of the library usage. Note that it is not a minimal working example (MWE). You will
find MWEs in examples/
.
import langbrainscore as lbs
pereira18_data = ...
gpt2 = lbs.encoder.HuggingFaceEncoder('gpt2')
brain = lbs.encoder.BrainEncoder()
for encoder in [brain, gpt2]:
print(encoder.encode(pereira18_data).shape)
Install this project using PyPI (not up-to-date; not recommended as of now)
python3 -m pip install langbrainscore
This project uses poetry
for dependency management and packaging
for development purposes (you don't need poetry to install it as a library/package from PyPI).
Why? poetry
allows running the application in a virtual environment while abstracting away which
virtual environment you use, e.g. conda
or virtualenv
, (or one of other less common alternatives).
- In order to set up your environment, obtain poetry, a lightweight python package, on your machine.
$ curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/1.1.10/get-poetry.py | python3 - <OR> $ make poetry
- If you want to have a dedicated
conda
environment for this project, create one now (or use an existing one). Else, letpoetry
create avenv
.(base) $ conda create -n langbrainscore-env python=3.8 (base) $ conda activate langbrainscore-env (langbrainscore-env) $ <OR> $ poetry shell (.venv) $ <OR> $ make venv (.venv) $
- Now use
poetry
to install the package and dependencies by navigating inside the repository(langbrainscore-env) $ poetry install <OR> (.venv) $ make install
- Before running a script using
langbrainscore
, make sure to activate your environment, or typepoetry shell
to create a venv.
Use a Docker image with the package and all dependencies pre-installed!
aloxatel/langbrainscore
(Debian-Ubuntu 20.04 derivative)- Click the badge to open the image on Docker hub:
Alternatively, use the pyproject.toml
file to create your own environment from scratch.
We follow the Semantic Versioning spec
(semver.org v2.0.0
):
Given a version number
MAJOR.MINOR.PATCH
, increment the:
MAJOR
version when you make incompatible API changes,MINOR
version when you add functionality in a backwards compatible manner, andPATCH
version when you make backwards compatible bug fixes. Additional labels for pre-release and build metadata are available as extensions to theMAJOR.MINOR.PATCH
format.
Additionally:
Major version zero
(0.y.z)
is for initial development. Anything MAY (and will) change at any time. The public API SHOULD NOT be considered stable. [ref].