A data engineering toolkit to extract metadata and replays from api.faforever.com
and load it into a data lake like BigQuery. The intention is to reconstruct (part) of the Forged Alliance Forver database as a public BigQuery dataset.
Using this toolkit, I've scraped the API and created a dataset of all game
models and some associated models (player
, gamePlayerStats
, mapVersion
, etc).
It lets you make stuff like this:
At the time of this writing, there are three public ways to use this dataset:
- A simple Datastudio Dashboard for quick browsing
- A Kaggle dataset where I've flattened, filtered and documented two CSVs
- A publicly accessible BigQuery dataset for your own queries (← the good stuff is here)
The tools includes utilities to extract, transform and load FAF metadata and replay data. Here's a demo session using faf.extract
and faf.transform
to create a BigQuery table:
An overview of all utilities:
faf.extract
: Scrapes models fromapi.faforver.com
, storing them as JSONs on disk.faf.transform
: Transform extracted JSON files into JSONL files ready for loading to a data lake.faf.parse
: Parses a downloaded.fafreplay
file into a.pickle
; this speeds up subsequent dumps of the replay.faf.dump
: Dumps the content of a replay (raw.fafreplay
or pre-parsed.pickle
) into a JSONL file to be loaded to the lake.
This is a bit of a fork/rewrite of fafalytics, another project of mine with much larger scope (not just scrape the API, but also download and analyse the binary replay files). I now think it's better to approach this with three smaller scoped projects - one for data engineering, one for dataviz and analytics, and one for ML.