Temporal Expression Identification to Go (TEI2GO) is an approach for fast and effective identification of temporal expressions. Currently, TEI2GO has models for six languages:
- German
- English
- Spanish
- Italian
- French
- Portuguese
However, it can be expanded to other languages. If you intend to expand it to another language feel free to create an issue, fork the repo, and do a pull request.
To facilitate the usage, all TEI2GO models were published on HuggingFace Hub. The code below demonstrates how one can load the French model:
On the command line, run:
pip install https://huggingface.co/hugosousa/fr_tei2go/resolve/main/fr_tei2go-any-py3-none-any.whl
Then the model can be loaded in two ways:
- Using Spacy
import spacy
nlp = spacy.load("fr_tei2go")
- Importing as a module
import fr_tei2go
nlp = fr_tei2go.load()
virtualenv venv --python=python3.8
source venv/bin/activate
pip install -r requirements.txt
To assert that everything is working run pytest: python -m pytest tests
python -m src.run spacy --data tempeval_3 ph_english --language en
cd models
sh download.sh
cd resources
sh download.sh
Hugo Sousa - [email protected]
This framework is part of the Text2Story project which is financed by the ERDF – European Regional Development Fund through the North Portugal Regional Operational Programme (NORTE 2020), under the PORTUGAL 2020 and by National Funds through the Portuguese funding agency, FCT - Fundação para a Ciência e a Tecnologia within project PTDC/CCI-COM/31857/2017 (NORTE-01-0145-FEDER-03185)
If you use this work, please cite the following paper:
@inproceedings{10.1145/3583780.3615130,
author = {Sousa, Hugo and Campos, Ricardo and Jorge, Al\'{\i}pio},
title = {TEI2GO: A Multilingual Approach for Fast Temporal Expression Identification},
year = {2023},
isbn = {9798400701245},
publisher = {Association for Computing Machinery},
url = {https://doi.org/10.1145/3583780.3615130},
doi = {10.1145/3583780.3615130},
booktitle = {Proceedings of the 32nd ACM International Conference on Information and Knowledge Management},
pages = {5401–5406},
numpages = {6},
keywords = {temporal expression identification, multilingual corpus, weak label},
location = {Birmingham, United Kingdom},
series = {CIKM '23}
}