Skip to content

Latest commit

 

History

History
44 lines (28 loc) · 2.77 KB

README.md

File metadata and controls

44 lines (28 loc) · 2.77 KB

fafdata

Build Status

A data engineering toolkit to extract metadata and replays from api.faforever.com and load it into a data lake like BigQuery. The intention is to reconstruct (part) of the Forged Alliance Forver database as a public BigQuery dataset.

The dataset

Using this toolkit, I've scraped the API and created a dataset of all game models and some associated models (player, gamePlayerStats, mapVersion, etc).

It lets you make stuff like this: Scatter plot panels

At the time of this writing, there are three public ways to use this dataset:

  • A simple Datastudio Dashboard for quick browsing
  • A Kaggle dataset where I've flattened, filtered and documented two CSVs
  • A publicly accessible BigQuery dataset for your own queries (← the good stuff is here)
    • Try the query SELECT COUNT(id) FROM `fafalytics.faf.games` WHERE DATE(startTime) = "2022-01-01"
      (you might pay a tiny amount for this)
    • Try pinning the fafalytics project in Cloud Console

The utilities

The tools includes utilities to extract, transform and load FAF metadata and replay data. Here's a demo session using faf.extract and faf.transform to create a BigQuery table:

from faforever to bigquery in 30s

An overview of all utilities:

  • faf.extract: Scrapes models from api.faforver.com, storing them as JSONs on disk.
  • faf.transform: Transform extracted JSON files into JSONL files ready for loading to a data lake.
  • faf.parse: Parses a downloaded .fafreplay file into a .pickle; this speeds up subsequent dumps of the replay.
  • faf.dump: Dumps the content of a replay (raw .fafreplay or pre-parsed .pickle) into a JSONL file to be loaded to the lake.

Epilogue

This is a bit of a fork/rewrite of fafalytics, another project of mine with much larger scope (not just scrape the API, but also download and analyse the binary replay files). I now think it's better to approach this with three smaller scoped projects - one for data engineering, one for dataviz and analytics, and one for ML.