REST API for managing dataset metadata, versions, editions and distributions.
- Install Serverless Framework
- Setup venv
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
- Install Serverless plugins:
make init
- Install Python toolchain:
python3 -m pip install (--user) tox black pip-tools
- If running with
--user
flag, add$HOME/.local/bin
to$PATH
- If running with
The input is validated with json schema, see the models under schema/
Code is formatted using black: make format
Tests are run using tox: make test
For tests and linting we use pytest, flake8 and black.
Deploy to both dev and prod is automatic via GitHub Actions on push to main. You
can alternatively deploy from local machine with: make deploy
or make deploy-prod
.
The metadata API is structured around the following base concept - the dataset
:
+-- dataset-id=my-dataset
| +-- version=1
| +-- edition=20190101T105900
| +-- distribution=filename.txt
| +-- distribution=foo.txt
| +-- edition=20200101T105900
| +-- distribution=presentation.md
| +-- version=2
| +-- edition=20200101T105900
| +-- distribution=otherfile.md
| +-- edition=20210101T105900
dataset/version/edition
- my-dataset/1/20190101T105900
Each version and edition keeps a version named latest
(a reserved name for a version and edition), that always contains the latest version/edition POSTed to that resource, and can be accessed with datasets/my-dataset/versions/latest
to get the latest version and datasets/my-dataset/version/1/editions/latest
The correct schema definition that is used for validation in the examples below: see schema/*.json
- Create dataset: valid keycloack access token in header:
"Authorization": f"Bearer {accessToken}"
- Create or update version or edition: valid keycloack access token and owner-access to
:dataset-id
- List dataset/version/edition: Logged in user
GET /datasets
All available datasets. An optional query parameter parent_id
is accepted for filtering by parent dataset.
POST /datasets
{
"title": "Besøksdata gjenbruksstasjoner",
"description": "Sensordata fra tellere på gjenbruksstasjonene",
"keywords": ["avfall", "besøkende", "gjenbruksstasjon"],
"frequency": "hourly",
"accessRights": "public",
"privacyLevel": "green",
"objective": "Formålsbeskrivelse",
"contactPoint": {
"name": "Tim",
"email": "[email protected]",
"phone": "98765432"
},
"publisher": "REN"
}
This will create a dataset with ID=besoksdata-gjenbruksstasjoner, the id is derived from the title of the dataset. If another dataset exists with the same ID, a ID will be created with a random set of characters at the end of the id (eg: besoksdata-gjenbruksstasjoner-5C5uX)
PUT /datasets/:dataset-id
{
"title": "Besøksdata gjenbruksstasjoner oppdatert tittel",
"description": "Sensordata fra tellere på gjenbruksstasjonene",
"keywords": ["avfall", "besøkende", "gjenbruksstasjon"],
"frequency": "hourly",
"accessRights": "public",
"privacyLevel": "green",
"objective": "Formålsbeskrivelse",
"contactPoint": {
"name": "Tim",
"email": "[email protected]",
"phone": "11111111"
},
"publisher": "REN"
}
Updates a single dataset-id
, replaces old json document
PATCH /datasets/:dataset-id
{
"title": "Besøksdata gjenbruksstasjoner kun oppdatert tittel"
}
Partially updates a single dataset-id
. Note that patching is top-level shallow, i.e. updates inside deep structure values will behave as a PUT.
E.g. phone
must be supplied in the following PATCH, even though we are just changing name
and email
. If phone
was not supplied, it would be removed.
PATCH /datasets/:dataset-id
{
"contactPoint": {
"name": "Kim",
"email": "[email protected]",
"phone": "11111111"
}
}
GET /datsets/:dataset-id
POST /datasets/:dataset-id/versions
{
"version": "1",
"schema": {},
"transformation": {}
}
version
will become :version-id
in the examples below
PUT /datasets/:dataset-id/versions/:version-id
{
"version": "1",
"schema": {},
"transformation": {}
}
Updates a single version-id
, replaces old json document, version
key must maintain same value as :version-id
GET /datasets/:dataset-id/versions/:version-id
GET /datasets/:dataset-id/versions/latest
Get the latest version created on dataset-id
POST /datasets/:dataset-id/versions/:version-id/editions
{
"description": "Data for one hour",
"startTime": "2018-12-21T08:00:00+01:00", // inclusive
"endTime": "2018-12-21T09:00:00+01:00" // exclusive
}
GET /datasets/:dataset-id/versions/:version-id
GET /datasets/:dataset-id/versions/:version-id/latest
Get the latest edition created on :version-id
POST /datasets/:dataset-id/versions/:version-id/editions/:edition-id/distributions
{
"filename": "visitors.csv",
"format": "text/csv",
"checksum": "..."
}