Data store for hosted, and optionally controlled, datasets for BAS Field Operations and Air Unit.
This Data Store is a platform provided by the British Antarctic Survey (BAS) Mapping and Geospatial Information Centre (MAGIC) for hosting vector geospatial data used by BAS Operational Teams.
It consists of datasets used and controlled by the BAS Air Unit and Field Operations teams for planning and delivering field seasons. Some key datasets are further controlled by MAGIC for enhanced data management and automation.
Note: This project is focused on needs within the British Antarctic Survey. It has been open-sourced in case it's of use to others with similar or related needs. Some resources, indicated with a π‘ symbol, are not accessible publicly.
This project is a mature alpha. The aim is to provide a robust, useful, live service within the next few field seasons.
The aim for the 2023-2024 season was to provide an initial minimal implementation to meets core user needs.
Usage and feedback of this minimal implementation is helping guide which improvements and new features are needed for the 2024-2025 season.
As an alpha project, all, or parts, of this service:
- may, but should not, stop working (due to regressions or instability)
- may not work correctly, or as expectedly (including destructively)
- may change at any time (in terms of implementation or functionality)
- may have missing or outdated documentation
Note: Support for this project is provided on a best efforts, 'as is', basis.
WARNING: Outputs from this project should not be relied upon for operational use without thorough scrutiny.
In relation to operations that can be performed by end-users vs. platform operators (MAGIC):
- end-users cannot define new Controlled datasets/layers themselves
- end-users cannot add, change or remove fields/attributes in Controlled datasets themselves
Note: These are intentional limitations to ensure controlled datasets are suitably designed and can be supported.
In all these cases, end-users will need to request changes are made by contacting someone from MAGIC.
In relation to dataset Permissions, support for the following is not yet available:
- running the Azure to LDAP group sync automatically
- removing an Azure group from a role once synced
- users belonging to more than one role/team (i.e. a user cannot be in Air Unit and Field Ops team)
- restricting QGIS layer styles to layers a user is authorised to see #196 π‘οΈ
- automatically setting default session roles for new users
In relation to dataset and database backups:
- backups are not verified as being accurate or usable
- uncontrolled/planning datasets are not included in backups
This project is limited to the technical and operational aspects of providing a platform for hosting datasets. Other projects focus on these datasets and other aspects of providing a wider Geographic Information System (GIS) to BAS Operations.
This project is an evolution of a previous Experiment π‘.
End-users should consult the documentation developed for the wider Field Operations GIS Platform π‘, as this project is not aimed at end-users directly.
Note: This CLI is intended for use by MAGIC team members, not end-users.
A command line interface, ods-ctl
, is available for performing administrative tasks.
$ ods-ctl --help
This command will need running from within a deployed instance of this project. See the Infrastructure section for connection details (specifically the relevant app server).
Note: Calling the CLI without any command will not return any output. This is expected.
Note: Currently all log entries down to the info level are displayed alongside programme output. This includes a
message from the shapely
that can be safely ignored:
2023-10-23 14:01:29 - shapely.speedups._speedups - INFO - Numpy was not imported, continuing without requires()
ods-ctl auth check
: verifies authentication/authorisation services are availableods-ctl auth sync --azure-group [group-id] --ldap-group [group-id]
: syncs members of Azure groups to an LDAP group- the
--azure-group
option can be repeated to merge multiple source groups to a single target group
- the
ods-ctl backup now
: combinesods-ctl data backup
andods-ctl db backup
as part of a managed, rolling, backup set
ods-ctl config check
: verifies required configuration options have been setods-ctl config show
: displays the current application configuration
ods-ctl data backup --ouput-path [path/to/file.gpkg]
: saves datasets and styles to GeoPackage backup [1]ods-ctl data convert
: saves controlled routes and waypoints for printing and using in GPS devices
[1] Controlled Datasets and QGIS Layer Styles only.
ods-ctl db check
: verifies the database is availableods-ctl db setup
: configure a new database for useods-ctl db backup --ouput-path [path/to/file.sql]
: saves database to SQL backup file viapg_dump
[1]ods-ctl db run --input-path [path/to/file.sql]
: runs SQL commands contained in the input file
[1] Controlled Datasets and QGIS Layer Styles only.
Note: These instructions are intended for adapting into documentation by MAGIC team members.
The included QGIS project, developed as part of the wider Field Operations and Air Unit GIS, can be used for testing editing workflows and verifying expected behaviour.
- start QGIS LTR with the
ops-data-store
profile selected: - open the included QGIS project file
qgis-project.qgz
If needed, shortcuts can be created to start QGIS with a specific profile:
- on macOS:
open -a QGIS-LTR.app --args --profile {profile}
- on Windows:
C:\Program Files\QGIS 3.28.7\bin\qgis-lts-bin.exe --profile {profile}
Note: To start QGIS normally use the default
profile.
Tested with QGIS 3.28.11, macOS 12.7, Ops Data Store QGIS profile version 2023-10-02.0.
Note: These instructions are intended for adapting into documentation by MAGIC team members.
Replace {placeholder}
values wth settings from the relevant database listed in the Infrastructure
section, except for:
{database_role}
, which should be the relevant value from the Permissions Mapping section{username}
and{password}
, which should be the users BAS LDAP credentials
- open QGIS with the relevant profile active
- from Browser pane -> PostgreSQL -> New Connection:
- (ignore or use default values for options not specified)
- Name:
Ops Data Store ({location})
- Host:
{host}
- Database:
{database}
- Session Role:
{database_role}
- Authentication -> Configurations -> Create a new authentication configuration:
- (create or enter master password)
- (if a single user computer (i.e. not shared) users MAY use their NERC or login password for this)
- (ignore or use default values for options not specified)
- Name:
BAS LDAP ({location})
- Type: Basic Authentication
- Username:
{username}
- (users MUST NOT use their email address - e.g. 'conwat' not '[email protected]')
- Password:
{password}
- Save
- (create or enter master password)
- Test Connection
- OK
Tested with QGIS 3.28.11, macOS 12.7, Ops Data Store QGIS profile version 2023-10-02.0.
Note: These instructions are intended for adapting into documentation by MAGIC team members.
Note: These instructions depend on the Adding DB Connection workflow.
Note: These instructions relate to adding existing layers. See the Creating Planning DB Layers section for adding a new layer.
- open QGIS with the relevant profile and project active
- from the Browser pane -> PostgreSQL -> Ops Data Store
- (the user may have multiple Ops Data Store entries they'll need to pick from based on their current location)
- select relevant layer -> Add Layer to Project
Note: As QGIS exposes database schemas as part of layer hierarchy, users will need to pick from the relevant schema.
For Controlled Datasets the schema is: controlled
.
Tested with QGIS 3.28.11, macOS 12.7, Ops Data Store QGIS profile version 2023-10-02.0.
Note: These instructions are intended for adapting into documentation by MAGIC team members.
Note: These instructions depend on the Adding DB Layer workflow.
To configure a layer for the first time:
- from the Layers pane -> (added layer) -> Properties -> Attribute Form:
- pk, pid, updated_at, updated_by, lat_dd, lon_dd, lat_ddm and lon_ddm:
- General -> Editable: Uncheck
- Widget Type: Hidden
- pk, pid, updated_at, updated_by, lat_dd, lon_dd, lat_ddm and lon_ddm:
- set other layer properties (aliases, symbology, labelling, etc.) as needed
- Style -> Save as Default -> Datasource Database -> Ok
Note: The first layer style saved to the database will automatically create a public.layer_styles
table.
Note: The QGIS Layer Styles table is stored in the public
Postgres schema (as it is shared).
Tested with QGIS 3.28.11, macOS 12.7, Ops Data Store QGIS profile version 2023-10-02.0.
Note: These instructions are intended for adapting into documentation by MAGIC team members.
Note: These instructions depend on the Configure Layer Properties workflow.
To edit features in the route layer, snapping needs to be configured for the waypoint layer, as route features are constructed from waypoint features.
- enable the snapping toolbar
- click the Enable Snapping button from the snapping toolbar
- click the Advanced Configuration button from the snapping toolbar and change from All Layers to Advanced Configuration
- click the Edit Advanced Configuration button from the snapping toolbar and edit the configuration to:
- set Waypoints as the only available snapping layer
- set the tolerance to
80000.0
meters (80KM)
Tested with QGIS 3.28.11, macOS 12.7, Ops Data Store QGIS profile version 2023-10-02.0.
Note: These instructions are intended for adapting into documentation by MAGIC team members.
Note: These instructions depend on the Configure Layer Properties and Configure Layer Snapping workflows
Layers should be editable as normal. The add/edit feature form should already be configured to hide generated or platform fields and set appropriate aliases.
When editing the Route layer, all vertices MUST overlap with existing waypoints. QGIS will allow any coordinate to be selected, however when saved any vertex that does not overlap will be omitted due to the way routes are saved.
Note: These instructions are intended for adapting into documentation by MAGIC team members.
Note: These instructions depend on the Adding DB Connection workflow.
New layers can be created in the Field Ops Planning DB schema as normal.
Users SHOULD be advised to:
- use the 'Text, unlimited length (text)' data type for string data (rather than VARCHAR or similar)
- create a default style and save to the data source (database)
Note: These instructions are intended for adapting into documentation by MAGIC team members.
To access the PDF, CSV, GPX and FPL outputs for waypoints and routes in the BAS Air Unit travel network:
- visit
https://{placeholder}/content/air-unit-network/
in a web browser - download the desired outputs, outputs are organised by format
Replace {placeholder}
values wth settings from the relevant database listed in the Infrastructure
section.
Database and dataset backups are created automatically every day. A limited number of backups are kept
before the oldest is replaced. Use the config show
application CLI command to show
where backups are stored and how many will be kept.
If needed a manual backup can be created using the backup now
application CLI command.
Manual backups count towards maximum number of backups.
Within a given platform instance, see the relevant 1Password entry in the Infrastructure section for where backups are stored.
Within this directory, open the relevant JSON State file and check the
meta.newest_iteration_sha1sum
property, e.g. "9e0f11af67e08e9368ff3d94445826fd2014df9b".
Find this checksum in the iterations
list for details including the date of the backup, and it's location.
Permissions are assigned to users through groups they are part of being mapped to a roles (e.g. viewer).
To assign a user to a role:
- ask the relevant Azure group owner to add or remove users as needed [1]
- run the
auth sync
CLI command to update the BAS LDAP server - wait for the next BAS IT User Sync
- ask IT to run their script to set the default session role for the role to the user
Note: Users may only be assigned to a single role.
To revoke a role from a user
- ask the relevant Azure group owners to remove the user from all groups that map to that role
- run the
auth sync
CLI command to update the BAS LDAP server - wait for the next BAS IT User Sync
[1] If needed, BAS IT can also update team/group memberships in addition to owners.
This section relates to Controlled datasets.
- update
dataset-schemas.sql
using the template [1] with these changes:- replace
NEW_DATASET
with the singular, lower case, name of the new dataset (e.g. 'cave' not 'CAVES')- for multi-word names use underscores as a separator (e.g. 'moon_base' not 'moon-base')
- if the dataset does not include a Dataset identifier, remove the
id
column - add dataset specific columns as needed:
- ensure to use
TEXT
for string fields rather thanVARCHAR
, use constraints to validate lengths
- ensure to use
- replace
- update
grants.tpl.sql
to grant relevant roles and ad-hoc users write access - copy the statements creating the new dataset to a separate update file (e.g.
add_NEW_DATASET.sql
) - add Grants to the update file to apply schema wide read access grants to the new table [2]
- copy the update file to each application instance and apply it using the
db run
command - add the new table as a layer in QGIS and configure as needed (e.g. form fields, aliases, symbology)
- save the layer properties/style back to the data source and verify entry added to QGIS
public.layer_styles
table
[1]
-- NEW_DATASET
CREATE TABLE IF NOT EXISTS controlled.NEW_DATASET
(
pk INTEGER GENERATED ALWAYS AS IDENTITY
CONSTRAINT NEW_DATASET_pk PRIMARY KEY,
pid UUID NOT NULL UNIQUE DEFAULT generate_ulid(),
id TEXT NOT NULL,
updated_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT now(),
updated_by TEXT NOT NULL DEFAULT 'unknown',
geom GEOMETRY(Point, 4326),
lat_dd TEXT GENERATED ALWAYS AS (st_y(geom)::text) STORED,
lon_dd TEXT GENERATED ALWAYS AS (st_x(geom)::text) STORED,
lat_ddm TEXT GENERATED ALWAYS AS ((geom_as_ddm(geom)).y) STORED,
lon_ddm TEXT GENERATED ALWAYS AS ((geom_as_ddm(geom)).x) STORED
);
CREATE INDEX IF NOT EXISTS NEW_DATASET_geom_idx
ON controlled.NEW_DATASET USING gist (geom);
CREATE OR REPLACE TRIGGER NEW_DATASET_updated_at_trigger
BEFORE INSERT OR UPDATE
ON controlled.NEW_DATASET
FOR EACH ROW
EXECUTE FUNCTION set_updated_at();
CREATE OR REPLACE TRIGGER NEW_DATASET_updated_by_trigger
BEFORE INSERT OR UPDATE
ON controlled.NEW_DATASET
FOR EACH ROW
EXECUTE FUNCTION set_updated_by();
[2] Copy any statements from the [CONTROLLED SCHEMA]
section of grants.tpl.sql
which
begin GRANT SELECT ON ALL ...
.
This section relates to Controlled datasets.
Note: This section is a work in progress and may be incomplete.
- update
schemas.sql
making any relevant changes - create an update file with the statements to change the relevant entities in place (e.g. [1])
- copy the update file to each application instance and apply it using the
db run
command - in QGIS, update the properties for the updated layer/table as needed and save to the QGIS
public.layer_styles
table
Note: QGIS will create a new entry in the public.layer_styles
table, with the previous style/entry no longer set
as a default. These previous styles can be removed if needed.
[1] E.g.:
ALTER TABLE EXISTING_DATASET
ADD COLUMN NEW_COLUMN TEXT;
This section relates to Controlled datasets.
Note: This section is a work in progress and may be incomplete.
After dropping the relevant table, manually remove any styles for the removed layer in the QGIS
public.layer_styles
table.
It's intended for end users to create new planning datasets using QGIS as a database client.
See the Creating Planning DB Layers (in QGIS) section for more information.
Typer is used as the framework for the control CLI.
The control CLI uses a series of settings for connecting to external services and reporting internal information.
Some settings are read-only, such as the application version, others are write-only, such as database connection
details, and must be defined by the user, using either appropriate environment variables, or an .env
file.
Note: If using an .env
file, an example .example.env
is available as a guide.
Config Property | Environment Variable | Required | Sensitive | Read Only | Type | Description | Example |
---|---|---|---|---|---|---|---|
AUTH_AZURE_AUTHORITY |
APP_ODS_AUTH_AZURE_AUTHORITY |
No [2] | No | No | String | Endpoint used for authenticating against Azure | 'https://login.microsoftonline.com/b311db95-32ad-438f-a101-7ba061712a4e' |
AUTH_AZURE_CLIENT_ID |
APP_ODS_AUTH_AZURE_CLIENT_ID |
No [2] | No | No | String | Identifier used for authenticating against Azure | '3b2c5acf-728a-4b78-85f0-9560a6aad701' |
AUTH_AZURE_CLIENT_SECRET |
APP_ODS_AUTH_AZURE_CLIENT_SECRET |
No [2] | Yes | No | String | Secret used for authenticating against Azure | 'xxx' |
AUTH_AZURE_SCOPES |
- | No [2] | No | Yes | List of Strings | Permissions requested when authenticating against Azure | ['https://graph.microsoft.com/.default'] |
AUTH_LDAP_BASE_DN |
APP_ODS_AUTH_LDAP_BASE_DN |
No [2] | No | No | String | Base scope to apply to all LDAP queries | 'dc=example,dc=com' |
AUTH_LDAP_BIND_DN |
APP_ODS_AUTH_LDAP_BIND_DN |
No [2] | No | No | String | Identifier used for authenticating against LDAP server | 'cn=app,ou=apps,dc=example,dc=com' [3] |
AUTH_LDAP_BIND_PASSWORD |
APP_ODS_AUTH_LDAP_BIND_PASSWORD |
No [2] | Yes | No | String | Secret used for authenticating against LDAP server | 'xxx' |
AUTH_LDAP_CXT_GROUPS |
APP_ODS_AUTH_LDAP_CXT_GROUPS |
No [2] | No | No | String | LDAP naming context prefix used to identify groups | 'cn' [3] |
AUTH_LDAP_CXT_USERS |
APP_ODS_AUTH_LDAP_CXT_USERS |
No [2] | No | No | String | LDAP naming context prefix used to identify users | 'cn' [3] |
AUTH_LDAP_OU_GROUPS |
APP_ODS_AUTH_LDAP_OU_GROUPS |
No [2] | No | No | String | Scope for group related objects in LDAP server | 'groups' |
AUTH_LDAP_OU_USERS |
APP_ODS_AUTH_LDAP_OU_USERS |
No [2] | No | No | String | Scope for user related objects in LDAP server | 'users' |
AUTH_LDAP_URL |
APP_ODS_AUTH_LDAP_URL |
No [2] | No | No | String | Endpoint used for authenticating against LDAP server | 'ldap://ldap.example.com:389' |
AUTH_MS_GRAPH_ENDPOINT |
- | No [2] | No | Yes | String | Endpoint used for the Microsoft Graph API | 'https://graph.microsoft.com/v1.0' |
BACKUPS_COUNT |
APP_ODS_BACKUPS_COUNT |
Yes | No | No | Number | Number of backups to keep as part of a rolling window | '10' |
BACKUPS_PATH |
APP_ODS_BACKUPS_PATH |
Yes | No | No | String (Path) | Location to store application backups [4] | '/var/opt/ops-data-store/backups/' |
DATA_AIRNET_OUTPUT_PATH |
APP_ODS_DATA_AIRNET_OUTPUT_PATH |
Yes | No | No | String (Path) | Location to store Air Unit Network exports [4] | /var/www/ops-data-store/air-unit-outputs/ |
DATA_AIRNET_ROUTES_TABLE |
- | No | No | Yes | String | Name of database table used for Air Unit Network routes | 'route_container' |
DATA_AIRNET_ROUTE_WAYPOINTS_TABLE |
- | No | No | Yes | String | Name of database table used for Air Unit Network route waypoints | 'route_waypoint' |
DATA_AIRNET_WAYPOINTS_TABLE |
- | No | No | Yes | String | Name of database table used for Air Unit Network waypoints | 'waypoint' |
DATA_MANAGED_SCHEMA_NAME |
- | No | No | Yes | String | Name of the database schema containing controlled datasets | 'controlled' |
DATA_MANAGED_TABLE_NAMES |
APP_ODS_DATA_MANAGED_TABLE_NAMES |
Yes | No | No | List of Strings | Names of database tables used for controlled datasets | ['depot', 'instrument'] |
DATA_QGIS_TABLE_NAMES |
- | No | No | Yes | List of Strings | Names of database tables optionally used by QGIS | ['layer_styles'] |
DB_DSN |
APP_ODS_DB_DSN |
Yes | Yes | No | String | Application database connection string [1] | 'postgresql://user:pass@host/db' |
VERSION |
- | No | No | Yes | String | Application version, read from package metadata | '0.1.0' |
[1] The DB_DSN
config option MUST be a valid psycopg connection string.
[2] AUTH_*
config options are required if managing authentication/authorisation aspects of the Data Store.
[3] Make sure to use the correct naming context prefix for the LDAP server, e.g. cn=conwat
vs. uid=conwat
.
[4] These options MUST point to an existing directory that is writable by the application user.
The BAS Air Unit Network Dataset utility π‘ is used to convert the Waypoint and Route Controlled Datasets maintained by the BAS Air Unit into formats for use in GPS devices and other activities such as printing.
These datasets can be converted using the data convert
CLI command. Outputs are written
to the File Store and available to end-users through the Web Server.
To avoid the need for end-users to use the CLI to generate outputs, the
data convert
CLI command can be run automatically using cron or other task scheduling
tool.
For example to run with cron every 5 minutes:
*/5 * * * * /path/to/ods-ctl data convert >> /path/to/logs/$(date +\%Y-\%m-\%d-\%H-\%M-\%S-\%Z).log 2>&1
This will create per-run log files (e.g. /path/to/logs/2023-11-20-04:00:00-UTC.log
). See the
Installation section for how to configure automated conversions in a deployed instance.
Log files for automatic conversions are retained for 24 hours and then deleted via a crontab entry.
PostgreSQL is used for storing datasets. It uses the PostGIS extension for storing spatial information along with custom functions and data types for:
- creating ULIDs (stored as a UUID data-type)
- using the
generate_ulid
custom function - which relies on the pgcrypto extension
- using the
- formatting latitude and longitude values in the Degrees, Decimal Minutes format (DDM)
- using the
geom_as_ddm
custom function andddm_point
custom data type
- using the
- recording when and by who rows in controlled datasets are changed
- using the
set_updated_at
andset_updated_by
custom functions
- using the
All extensions, extension support views, custom functions/types and other shared objects are stored in the default
public
schema.
Datasets are stored in schemas based on how they are controlled. See the Datasets section for more information.
Database permissions form part of the Data Store's Permissions system - specifically to control who can access underlying data. Database roles, users and grants need to be created/set to implemented to the relevant parts of this permissions system.
Required Postgres roles are defined in the Permission Mappings section. Reference SQL
statements to create these roles are defined in the roles.sql
file.
Required Postgres users are defined by a combination of:
- the members of the relevant groups listed in the Permission Mappings section
- users to represent applications (see the Project Setup section for specific users required)
- a set of example users for each role to test/debug with (strongly recommended)
Reference SQL statements to create these users are defined in the
users.tpl.sql
file.
Note: This reference is a template. Values that need substituting with specific values based on how the database
has been implemented are indicated with {{ default/conventional/example value }}
.
Note: For BAS IT managed databases, the BAS IT User Sync will create required database users, including application users. A set of example users, based on the reference example users, should also be created and assigned the relevant roles by IT manually. See the relevant Infrastructure subsection for their credentials.
Postgres grants are required to implement the rights described in the Permission Mappings
section. Reference SQL statements for these grants are defined in the grants.tpl.sql
file.
Note: This reference is a template. Values that need substituting with specific values based on how the database
has been implemented are indicated with {{ default/conventional/example value }}
.
The file store holds application Backups and Converted Outputs. It is typically a file system, either using local disks or remote file shares. Other technologies, such as objects stores, cannot currently be used but could be supported if needed.
The file store may also be used for hosting the CLI and/or Database, however this is not required, and not common particularly for the database.
The file system enforces permissions via owner/group and ACL mechanisms (as used by the Web Server) component.
Each operating Environment uses a separate file store. I.e. there is no single common/global instance, despite being available from the same mount point in all environments (for consistency).
QGIS is the QGIS client end-users will use for editing and visualising geospatial data, and acts as the database client in relation to this project specifically.
For consistency/compatibility, the QGIS profile and project developed for the wider GIS are used for consistency and compatibility.
It is recommended that QGIS layer styles are stored in the platform Database such that they are centrally
maintained. The layer_styles
table QGIS creates should be held in the public
schema, as it is a shared resource
and not relevant to how datasets are organised.
Note: Layer styles are accessible to all authenticated users and so the names and attributes of the table/view/layer each style belongs to, can be found through this table, even if the related layer is otherwise restricted. This is an information leak and should be considered when naming layers and their attributes if the names themselves are sensitive.
SVGs used as part of layer symbologies are stored in ./content/qgis/symbology/svgs/
within the
File Store and available through the Web Server for referencing in styles.
Microsoft Entra (Azure Active Directory) is the Identity Provider (IDP) used by NERC, within which BAS sits. An application registration represents this project within Azure, granting it permission to relevant resources within the Microsoft Graph API.
A client secret (and non-secret client ID) is used for this project to identify its application registration. Separate secrets used for each instance and need to be rotated every 6 months (180 days).
Currently, app registration secrets are rotated manually:
- from the Application Registration in the Azure Portal:
- from the Certificates & secrets section, remove the existing secret
- from the same section, create a new client secret with the same description and 180 day expiry date
- update the
APP_ODS_AUTH_AZURE_CLIENT_SECRET
environment variable (via a.env
file or through some other method) in the relevant instance
The Microsoft Graph API provides programmatic access to Microsoft 365 resources. For this project, the members of specified Microsoft Teams.
Microsoft Teams is a collaboration tool within the Microsoft 365 platform used by a number of BAS departments and projects. Teams use Azure groups to record their members, which this project can access via the Microsoft Graph API as the basis for the Data Store's Permissions system.
LDAP is the Identity Provider (IDP) used by BAS IT, specifically for unix systems including Postgres and Apache. It is used for both authentication and authorisation through a number of application controlled groups. Group members are synced from Azure groups representing Microsoft Teams as part of the Data Store's Permissions system.
Apache HTTP Server is used for hosting content from the File Store, making it
available to end users. It is managed by BAS IT and uses a virtual host configured in
/etc/httpd/sites/10-ops-data-store.conf
. Automatic directory listings are disabled for this site.
The web server includes:
- outputs for converted Routes and Waypoints datasets, restricted using LDAP
- users must have a valid LDAP account and be a member of the Viewer role from the Permissions system
- automatic directory listings are enabled for this content
- QGIS Layer SVGs
- files are accessible anonymously so they can be used in QGIS layer styles
Where restricted content needs to be shared through the web server, special file permissions are needed for the web server to access content whilst preventing users accessing files directly through the file system, which would bypass restrictions enforced by the web server.
Conventional file permissions are set for a parent directory preventing world/other read (and execute for directories) An additional NFS ACL is set granting these removed permissions to the web server user only.
To read a file from the file system, read access is required on the file itself and all parent directories up to and
including root (/
). The restrictive file permissions on the parent folder therefore prevent users accessing
restricted content via the file system, even if they know the path to a file or directory.
Note: Files or directories within this parent directory require world/other read (and execute for directories) permissions for the web server to access them. I.e. only the parent directory should use restricted permissions.
These permissions and ACLs are set automatically in BAS IT managed instances.
As part of the Data Store's Permissions system, a mechanism has been implemented by BAS IT to:
- replicate relevant objects between LDAP servers
- create, update or remove Database users and their permissions based on LDAP objects
This mechanism is only accessible to BAS IT and any issues will need resolving via the BAS IT Service Desk. IT have access to a log file which records the results of recent syncs.
Note: Inconsistencies can occur after a sync, however these should normally resolve themselves in the next sync.
For reference, this mechanism runs on a daily schedule at:
- 09:00 & 09:15 UTC (06:00 & 06:15 Rothera)
- 15:00 & 15:15 UTC (12:00 & 12:15 Rothera)
- 21:00 & 21:15 UTC (18:00 18:15 Rothera)
(Where the first time is the LDAP to LDAP sync and the second LDAP to Postgres. Rothera is UTC -03:00).
For replicating LDAP objects between servers:
- the Cambridge (Production) LDAP server is considered the source of truth
- application LDAP groups, and users these groups contain as members, are replicated to LDAP servers in other environments
- credentials for users are replicated, ensuring users can use the same credentials in all environments
- user credentials will be updated in other environments if changed in the source LDAP server at the next sync execution
- users that are no longer members of groups will be reflected in other environments at the next sync execution
For managing database users based on LDAP objects:
- base Postgres roles are created for each application role/group, granted relevant access to database objects
- for each application LDAP group, each member is created as a Postgres user role, named after the LDAP UID attribute
- these Postgres user roles are configured to inherit the relevant base Postgres role, granting them relevant permissions
- user credentials are not replicated to these user roles as Postgres is configured to use LDAP authentication
- user roles for users removed from an LDAP group will no longer inherit from the relevant base role, revoking permissions
- user roles may inherit from multiple base Postgres roles if members of multiple application LDAP groups
To provide Field Operations with satellite imagery for areas they are interested in, MAGIC routinely searches for and downloads imagery for a given series of Areas of Interest (AOIs) using a script. This data is then synced to a nominated location where Field Operations can access it.
Whilst affiliated with this project (and set up under it), this script/functionality is not formally part of the Data Store but is facilitated by it.
See the MAGIC Remote Sensing project for more information.
To facilitate this script, this project provides:
- a controlled dataset for AOIs (
controlled.eo_acq_aoi
) - a database user with read access to this dataset (
ods_app_eo_acq_script
)
See the Database Permissions section how this user is managed and assigned permissions.
See MAGIC/ops-data-store#198 π‘οΈ for how this was set up.
Datasets hosted in this platform are restricted as to who can read and/or edit from them. The platform includes a simple permissions system to enforce these restrictions.
Note: This permissions system does not apply to QGIS Layer Styles.
- for users representing end-users (individuals), a role based permissions system is used
- for users representing applications, ad-hoc permissions are used
Permissions for users and roles to access/modify entities that are known to exist are implemented using Postgres permissions. This includes ad-hoc permissions needed for users representing applications.
For entities created dynamically by end-users (such as for Planning Datasets), it isn't possible to pre-define permissions, or tenable to expect end-users apply database grants after creating new entities. Without this however other users won't be able to access them.
I.e.:
- a user Alice creates a new table 'foo'
- another user Bob is unable to access it because Alice hasn't granted Bob access to it
To work around this, end-users assume a common role using SET ROLE
(set in the QGIS database connection for example).
This means entities are created (and otherwise interacted with) as the same user, and so does not require any grants to
be applied.
I.e.:
- a user Alice, acting as a common role
field_team
, creates a new table 'foo' - another user Bob, also acting as the
field_team
common role, is able to access it because they are the same user
Note: Users must be assigned to inherit from any role they wish to act as, and must still login as themselves before acting as a common role, preventing users bypassing access restrictions.
Note: Due to a bug in QGIS, the Session Role setting is not respected. As a workaround, IT can set a default DB role for users.
Note: A side effect of this SET ROLE
approach means that role inheritance is disabled, meaning permissions must
be granted directly to each role.
This system includes three roles which can be assigned to individual users:
- admin: can view and change any information to manage and administer the platform (inc. members of MAGIC and BAS IT)
- owner: can change and view information
- viewer: can read information only
Note: Users can be assigned to multiple roles but may only assume one role at a time within a session.
I.e. If a user is assigned the owner and viewer roles they may act as either the owner or viewer role, not both.
Note: Users representing applications are assigned permissions directly, rather than via one of these roles.
The admin and viewer roles are global (applies to all datasets). The owner role is scoped to a particular team. with team members only able to change datasets within their team (i.e. that their team owns). These teams are:
- BAS Field Operations
- BAS Air Unit
Note: All roles (including teams) are granted read only access to all Controlled Datasets.
E.g. For a dataset of field instruments owned by the BAS Field Operations team:
- BAS Field Operations team members can change this dataset, as their team owns the dataset
- BAS Air Unit team members cannot change this dataset, as though they hold the owner role, they are a different team
- BAS Air Unit team members will be granted access to view the dataset
- admin role holders can change this dataset, as they can change all information in the platform
- viewer role holders can view but cannot change this dataset, as they can only view all information in the platform
These roles are implemented in the Database using roles and users:
- roles are represented as postgres roles, which are granted permissions (using Postgres grants) to read or change information in all or specific tables
- individuals are represented as postgres users, which are assigned to (inherit from) a single role
Users are assigned roles based on the membership of groups held in an LDAP server. I.e. members of LDAP group 'x' are assigned to Postgres role 'x'. LDAP group members are copied from a series of Azure groups representing Microsoft Teams. These Teams are used by departments outside of this platform and project. The intention is to prevent departments needing to maintain a duplicate membership list, which may get out of sync.
Membership information moves in one direction: from Azure (MS Teams) to LDAP, then to Postgres. Each LDAP group can include multiple Azure groups forming a union of members.
Users must exist in the LDAP server before they can be added to LDAP groups. Missing users will first need registering, at BAS by contacting the IT Service Desk.
Mappings for roles, teams, the database and LDAP:
Role | Team | Azure Groups | LDAP Group | Postgres Role |
---|---|---|---|---|
Admins | - | 34db44b7-4441-4f60-8daa-d0c192d74704 |
apps_magic_ods_write_admin |
ods_write_admin |
Owners | BAS Field Operations | 75ec55c1-7e92-45e3-9746-e50bd71fcfef |
apps_magic_ods_write_fo |
ods_write_fo |
Owners | BAS Air Unit | 7b8458b9-dc90-445b-bff8-2442f77d58a9 |
apps_magic_ods_write_au |
ods_write_au |
Viewers | - | [1] | apps_magic_ods_read |
ods_read |
The BAS IT User Sync and the Command Line Interface, specifically
commands in the auth
command group are used for synchronising users, and verifying
users have been synced, between these systems and between environments.
[1]
Union of:
34db44b7-4441-4f60-8daa-d0c192d74704
75ec55c1-7e92-45e3-9746-e50bd71fcfef
7b8458b9-dc90-445b-bff8-2442f77d58a9
Controlled Datasets hosted in this platform, their QGIS Layer Styles and the underlying Database can be exported as file based backups, to allow undesired data changes to be recovered (i.e. where a feature is accidentally deleted).
A fixed number of backups are kept, with the oldest backup replaced by the newest when a configured limit is reached. This forms a backup window within which backups can be accessed. The length of this window depends on the number of backups to keep and how often they are made (i.e. a limit of 7 with daily backups gives a window of a week).
Note: Backups are intended as point-in-time snapshots of data, rather than long-term, data archives (which will likely require manual preparation).
Note: Backups only include controlled datasets. Additional 'uncontrolled' datasets, or any other database objects, are ignored.
Currently, two separate backups are created:
Backups are independent of each other, i.e. if the backup limit is 6, a maximum of 6 GeoPackage, and 6 Postgres, backups will be retained - rather than 3 of each. Backups are captured together to try and ensure consistency.
The Command Line Interface, specifically commands in the
backup
command group can be used to create and manage backups.
GeoPackage backups:
- use a standardised, interoperable, file format for geospatial information
- can be used directly with GIS clients or other tools such as GDAL/OGR
- contain Controlled Datasets and QGIS layer style information only
WARNING! GeoPackage backups are not tested/verified.
Troubleshooting steps:
Database backups:
- use a technology specific file format for database information (postgres dump)
- requires loading into a compatible Postgres database to use
- contains all database objects (data types, tables, views, functions, triggers) from the MAGIC Managed Datasets schema
and select objects from the
public
schema only - do not contain global objects (users, roles and grants - see Permissions section)
Database backups are intended to give additional confidence whilst this project is initially setup system. They MUST NOT be replied upon for ensuring information not included in the GeoPackage Backups, such as uncontrolled datasets, are backed up.
WARNING! Database backups are not tested/verified.
WARNING! Database backups will be withdrawn in time.
For BAS IT managed infrastructure, additional backups are maintained by BAS IT:
- Virtual Machines are backed up incrementally daily and fully every week for 1 month and then a monthly backup for 6 months
To avoid issues with file systems that capture all files within their own managed backups (such as the BAS SAN), backups are named using generic names that are rotated. In order to tell specific backups apart a backup state file is maintained automatically alongside each backup.
This state file is encoded as JSON using the .state.json
extension and designed to be human-readable. State files
contain two sections, a meta
section and a list of iterations. Iterations are identified using the SHA1 checksum of
the file contents - as the filename will be made generic and change over time.
Example state file for a three file backup set:
{
"meta": {
"max_iterations": 3,
"iterations": 3,
"newest_iteration_sha1sum": "1f3c3cd6977c3253d6cf0a4219dbb74a791fdccb",
"schema_version": "1",
"updated_at": "2023-11-09T11:44:05.188558+00:00"
},
"iterations": {
"5a88d442089f97497fa3e2cc25827073c2d4c518": {
"sha1sum": "5a88d442089f97497fa3e2cc25827073c2d4c518",
"replaces_sha1sum": "",
"created_at": "2023-11-09T11:39:49.535065+00:00",
"original_name": "db_backup.sql",
"sequence": 1,
"path": "backups/db_backup_1.sql"
},
"cf7650f9cafda9aee8815161b9e088d820f32f4a": {
"sha1sum": "cf7650f9cafda9aee8815161b9e088d820f32f4a",
"replaces_sha1sum": "5a88d442089f97497fa3e2cc25827073c2d4c518",
"created_at": "2023-11-09T11:43:08.484397+00:00",
"original_name": "db_backup.sql",
"sequence": 2,
"path": "backups/db_backup_2.sql"
},
"1f3c3cd6977c3253d6cf0a4219dbb74a791fdccb": {
"sha1sum": "1f3c3cd6977c3253d6cf0a4219dbb74a791fdccb",
"replaces_sha1sum": "cf7650f9cafda9aee8815161b9e088d820f32f4a",
"created_at": "2023-11-09T11:44:05.183804+00:00",
"original_name": "db_backup.sql",
"sequence": 3,
"path": "backups/db_backup_3.sql"
}
}
}
The oldest backup is identified by the lowest sequence
value (i.e. 0
) and does not have a replaces_sha1sum
value
as it logically doesn't replace a previous backup. For other backups, the replaces_sha1sum
value can be used to
calculate the order of backups if the sequence information is lost. The order backups appear in the iterations
list
MUST NOT be used to infer the order of backups.
The created_at
and original_name
properties relate to the file added to the backup set (which may be named
generically) or include a timestamp or other unique value. These values, along with the sha1sum
will therefore not
change.
Note: The backup system implemented within this application does not include measures to protect against file corruption. Separate processes will be needed to achieve this if needed.
Backups can be captured automatically using cron or other task scheduling tool by calling the
backup now
CLI command.
For example to run with cron every day at 04:00 (AM):
0 4 * * * /path/to/ods-ctl backup now >> /path/to/logs/$(date +\%Y-\%m-\%d-\%H-\%M-\%S-\%Z).log 2>&1
This will create per-run log files (e.g. /path/to/logs/2023-11-20-04:00:00-UTC.log
). See the
Installation section for how to configure automated backups in a deployed instance.
Log files for automatic backups are retained for 30 days and then deleted via a crontab entry.
The Sentry error tracking platform is used to monitor scheduled tasks such as Backups using the Sentry CLI.
Datasets hosted in this platform can be classed as either:
- controlled: datasets whose schemas are formally controlled by data owners (Ops) and platform operators (MAGIC) and typically represent fundamental, enduring, entities such as routes, depots, AOIs, etc.
- planning: other datasets Ops users may wish to store, with a wide range of maturity - essentially ignored by this platform
Controlled datasets are formally reviewed to ensure a suitable schema is used. All such datasets provide minimally agreed functionality, using the same base table structure, with additional fields and functionality extending from this.
Controlled datasets are stored in the controlled
database schema. Definitions for these datasets are declared in
dataset-schemas.sql
.
This base table schema comprises:
- a set of Identifier fields
- a pair of Last Update fields
- a set of Geospatial fields
Additional fields originate from one of these other projects:
Controlled datasets are assigned to these teams in relation to the Permissions needed to change the data they contain:
Dataset Name | Database Entity Name | Owner (Team) |
---|---|---|
Depots | depot |
BAS Field Operations |
Instruments | instrument |
BAS Field Operations |
Waypoints | waypoint |
BAS Air Unit |
Routes | route_container , route_waypoint , route |
BAS Air Unit |
EO Acquisition AOIs | eo_acq_aoi |
MAGIC |
I.e. the Depots dataset can only be changed by the BAS Field Operations team.
All controlled datasets have at least two identifiers, though most will have three as defined below:
Domain | Owner | Audience | Column Name | Data Type | Format/Scheme | Required | Unique |
---|---|---|---|---|---|---|---|
Technology | MAGIC/IT | MAGIC/IT | pk |
Integer | PostgreSQL Identity | Yes | Yes |
Platform | MAGIC | MAGIC/Ops | pid |
UUID | ULID | Yes | Yes |
Dataset | Ops | MAGIC/Ops | id |
String | - | No | Optional |
For example:
PK | PID | ID | ... |
---|---|---|---|
1 |
01H26N7D9Q064B6QQMCPP5NQK0 |
ALPHA |
... |
... | ... | ... | ... |
9 |
01H26N7D9SGG348R24KN6W50GX |
INDIA |
... |
... | ... | ... | ... |
This ID is dictated by whichever technology used to implement this platform. For Postgres, this is a primary key using an Integer identity column, as they are the easiest to manage.
WARNING! This column MUST NOT be relied upon outside each individual database instance, as it MAY NOT be the same between instances. If we use a different technology in the future, this MAY use a different concept, necessitating the loss of any existing values.
Note: This attribute SHOULD NOT be exposed to end-users, either by using database views, or configuring fields in layers, to hide the column.
This ID is dictated by us to uniquely and persistently identify features across all datasets hosted in the platform and ideally across datasets anywhere (i.e. globally unique). The ULID scheme is used to implement these identifiers as they're reasonably compact and naturally sort by time.
Once issued they do not change and SHOULD be used to distinguish features and/or for defining feature relations. Crucially this value has no meaning (to us or end-users) and is therefore neutral. This identifier MAY be exposed to end-users, though it is assumed they won't use or recognise them directly.
Note: PID
was chosen to avoid using FID
, as this is typically used, and possibly reserved, in GIS systems.
Note: ULIDs are stored in a Postgres UUID data type to improve indexing. This results in a non-standard representation which can appear misleading.
This ID is not required, though it's assumed almost all datasets will have a value that users use to identify features, even if only for a time limited period. Values are uncontrolled in terms of needing to be:
- unique
- applied consistently
- following a scheme or convention
- meaningful and/or recognisable by end-users
- persistent over time
Note: These properties are nevertheless recommended in any identifier.
All controlled datasets have two last update columns:
updated_at
: timestamp of when a row was last changedupdated_by
: identity of who last changed a row
Both fields are updated whenever data in a row changes via a Postgres trigger on each table, which use the NOW()
or
session_user
function/variable respectively.
The value of the 'Updated By' column should correspond to an end-user, specifically a NERC username, which can be
optionally resolved to a name or other label. E.g. conwat
can become Connie Watson.
Note: session_user
is used due rather than current_user
due to the way Permissions grants
are implemented.
For example:
PK | PID | Updated At | Updated By | ... |
---|---|---|---|---|
1 |
01H26N7D9Q064B6QQMCPP5NQK0 |
2023-08-24 15:23:01.583312 +00:00 |
conwat |
... |
... | ... | ... | ... | ... |
9 |
01H26N7D9SGG348R24KN6W50GX |
2023-10-14 09:46:23.237912 +00:00 |
conwat |
... |
... | ... | ... | ... | ... |
All controlled datasets which implement the base schema have a PostGIS EPSG:4326 point geometry column named geom
.
Such datasets will also have a set of derived fields to format the coordinates of this geometry in both:
- decimal degrees (DD):
- derived using the PostGIS
st_y()
andst_x()
functions respectively - held in the
lat_dd
andlon_dd
columns
- derived using the PostGIS
- degrees decimal minutes (DDM):
- derived using the custom
geom_as_ddm
function - held in the
lat_ddm
andlon_ddm
columns
- derived using the custom
These derived columns are generated, meaning they are
computed from other table columns (in this case geom
) and will always be sync. These columns are inherently read
only and are cast or returned as text.
Values for DDM formatted coordinates use a fixed number of decimal places (6) to match the default used for DD formatted coordinates.
For example:
PK | PID | Geom | Lat (DD) | Lon (DD) | Lat (DDM) | Lon (DDM) | ... |
---|---|---|---|---|---|---|---|
1 |
01H26N7D9Q064B6QQMCPP5NQK0 |
0101000020E6100000F049670B7E804FC0F983F2C579D850C0 |
-67.38243244822651 |
-63.00384657424354 |
67Β° 22.945947' S |
63Β° .230794' W |
... |
... | ... | ... | ... | ... | ... | ... | ... |
9 |
01H26N7D9SGG348R24KN6W50GX |
0101000020E6100000B685DCAFBB7752C06F6E39F206B852C0 |
-74.87542396172078 |
-73.8708305028475 |
74Β° 52.525438' S |
73Β° 52.24983' W |
... |
... | ... | ... | ... | ... | ... | ... | ... |
The controlled Route dataset is non-spatial, as they consist of an ordered sequence of Waypoint identifiers. This prevents editing of routes in QGIS which requires a linestring geometry.
To support being able to edit route features in QGIS a writable view is implemented as part of the controlled dataset
schemas. For reading, the view includes a derived geom
column by combining the point geometries of each waypoint
included in the route into a linestring. For writing, triggers split the linestring geometry into points, which are
converted into waypoint identifiers, using the waypoint geometry as a join. A tolerance of 1KM is used to avoid
precision differences preventing this spatial join not need to match feature positions exactly.
A schema for Field Operations to plan additional activities by creating additional datasets, storing experimental data and prototyping changes to Controlled datasets.
These datasets are managed by Field Operations, and do not need to (but may) follow any conventions or requirements such as naming conventions, required columns, etc.
It's expected that datasets will vary in maturity and importance, with data ranging from fully experimental through to being a definitive source of truth.
Datasets are stored in the planning_field_ops
database schema. No table schemas etc. are defined for these datasets
in this project.
Note: Whilst datasets in this schema are restricted, QGIS Layer Styles that apply to them are not due to a general platform limitation.
Required infrastructure:
- a service or server for running Python applications
- a service or server for running Postgres databases
- an Azure Entra (Active Directory) app registration
- LDAP groups and application user
- a file system for holding Backups
- a Sentry subscription to monitor Automated backups
Required OS packages for Python app server:
- Python >= 3.9
- OpenSSL >= 1.1.1 [1]
- OpenLDAP (including development headers)
- GDAL 3.4 (including development headers and the
gdal-config
binary [2]) - libxml (including the
xmllint
binary) - libpq (including the
pg_dump
binary)
[1] In exceptional circumstances, the urllib3
package can be downloaded to use an earlier version of OpenSSL. See
Installation section for more information.
[2] The GDAL OS and Python packages MUST be the same version, and must therefore be version 3.4
.
Required Postgres extensions:
- PostGIS
- pgcrypto
- fuzzystrmatch
A single database, and an account with permissions to create, read, update and delete objects (inc. Postgres schemas)
within this, is required to run this application. This database and account can be named anything but ops_data_store
and ops_data_store_app
respectively are recommended as conventional defaults.
In addition, a mechanism must be available for creating and maintaining Postgres role and users as outlined in abstract in the Permissions section and more specifically in Database Permissions. This must support updating permissions as needed based on the memberships of the relevant LDAP groups.
Required Azure app registration permissions (application assigned):
https://graph.microsoft.com/GroupMember.Read.All
https://graph.microsoft.com/User.Read.All
This app registration will need to be registered within the tenancy that group/team members will be synced from.
LDAP groups are needed for each role in the Permissions system, named as per the mapping table.
These groups must:
- be owned by a user representing this platform, conventionally named
apps_magic_ods
- all sit within the same OU (e.g. 'OU=groups')
In addition:
- all LDAP users must be contained in a single OU (e.g. 'people' or 'users')
A file system is required for storing application backups and log files.
This file system must:
- be accessible/addressable via the Python
Path
class (i.e. a local, or mounted network, file system) - provide at least the created date for files via metadata the Python
Path.stat
method can access - provide suitable permissions for the Python application to read and write all files (i.e. via an OS user)
- have sufficient space to maintain the number of backups and log files configured (recommended minimum: 5GB)
As this file system is used for backups, it should to the extent possible, be designed to be stable/reliable.
A Sentry subscription is required for monitoring automated backups with:
- a Sentry team that should receive failure notifications (e.g.
magic
) - a Sentry project to represent the platform (
ops-data-store
)
Note: Ensure all steps in the Project Setup section are complete before installing.
For the Python application, it is strongly recommended to install this project into a virtual environment:
$ python -m venv /path/to/venv
$ source /path/to/venv/bin/activate
$ python -m pip install --upgrade pip
The Python Package for this project (see Deployment section) requires installing from a private package registry provided by the BAS GitLab instance. This registry requires authentication, however as this project is available publicly, and has been open sourced, the generic deployment token below can be used by anyone to access it.
$ python -m pip install ops-data-store --extra-index-url https://public-access:[email protected]/api/v4/projects/1134/packages/pypi/simple
Note: The urllib3
dependency must be downgraded if installing on an OS with a version of OpenSSL earlier than
1.1.1:
$ python -m pip install --upgrade urllib3==1.26.18
The control CLI can be used to check the application has been installed correctly and is the expected version:
# if installed in a virtual environment
$ source /path/to/venv/bin/activate
$ ods-ctl --version
0.1.0
Optionally, the ods-ctl
command can be added to the PATH or symlinked to a location already in the PATH (such as
~/bin/
) to make it easier to call:
$ mkdir ~/bin
$ cd ~/bin
$ ln -s /path/to/venv/bin/ods-ctl ods-ctl
$ cd ~
You pay need to add ~/bin
to your PATH
by editing:
$ vi ~/.bash_profile
And adding:
# User specific environment and startup programs
PATH=$PATH:$HOME/.local/bin:$HOME/bin
export PATH
Optionally, create a .env
file to set Configuration options or use relevant environment variables. See the
Infrastructure section for connection information to use.
Note: If using an .env
file, the application will look in parent directory of where the application is installed
(i.e. the virtual environment) and it's parents, up to the root directory (i.e. /
). Common directories for
configuration files such as /etc/
and ~/.config/
are NOT checked.
To check the application configuration is valid and as expected:
$ ods-ctl config check
Ok. Configuration valid.
$ ods-ctl config show
To check the application database is available, and then configure it for use:
$ ods-ctl db check
Ok. DB connection successful.
$ ods-ctl db setup
Setting up database for first time use.
Note: If this command fails, please either create an issue in the 'Ops Data Store' project in GitLab, or contact MAGIC at [email protected] with the output of this command.
Ok. Database setup complete.
Create required schemas and empty controlled datasets in the database:
schemas.sql
dataset-controlled.sql
file against the database.
$ ods-ctl db run --input-path resources/db/schemas.sql
$ ods-ctl db run --input-path resources/db/datasets-controlled.sql
Create database roles and uses needed for the Permissions system and apply required Database Grants, either directly against the database or by incorporating into a permissions management mechanism.
- within the File Store create the relevant directory with suitable permissions for:
- style authors to write to it
- the Web Server to serve files from it anonymously
If using an instance for syncing users and assigning dataset permissions, check Azure and LDAP are available:
$ ods-ctl auth check
Note: If this command fails, please check the configured credentials and external connectivity.
If problem persists, create an issue in the 'Ops Data Store' project in GitLab, or contact MAGIC at [email protected] with the output of this command.
Ok. Auth connection successful.
Then sync Azure groups to LDAP:
$ ods-ctl auth sync -ag 34db44b7-4441-4f60-8daa-d0c192d74704 -lg apps_magic_ods_admin
$ ods-ctl auth sync -ag 75ec55c1-7e92-45e3-9746-e50bd71fcfef -lg apps_magic_ods_write_fo
$ ods-ctl auth sync -ag 7b8458b9-dc90-445b-bff8-2442f77d58a9 -lg apps_magic_ods_write_au
$ ods-ctl auth sync -ag 34db44b7-4441-4f60-8daa-d0c192d74704 -ag 75ec55c1-7e92-45e3-9746-e50bd71fcfef -ag 7b8458b9-dc90-445b-bff8-2442f77d58a9 -lg apps_magic_ods_read
To monitor application backups via Sentry install the Sentry CLI:
$ mkdir ~/bin
$ export INSTALL_DIR=./bin
$ curl -sL https://sentry.io/get-cli/ | sh
Note: The CLI can also be installed manually.
Create a Sentry cron monitor within the relevant Sentry subscription:
- project: ops-data-store
- name/slug:
ods-backups
- schedule type: cron
- cron pattern:
0 4 * * *
- cron timezone: UTC
- grace period:
1
minutes - max runtime:
2
minutes - notify:
#magic
- failure tolerance:
1
- recovery tolerance:
1
- environment: All Environments
Once created, edit the monitor's associated alert to set:
- conditions: post message to
#dev
channel in MAGIC Slack (in addition to regular team notification) - action interval: 5 minutes
- alert owner:
#magic
Create directory for cron backup logs:
$ mkdir -p ~/logs/cron/
Add a new cron job crontab -e
:
SHELL=/bin/bash
[email protected]
## Operations Data Store automated backups - https://gitlab.data.bas.ac.uk/MAGIC/ops-data-store#backups-automation
0 4 * * * SENTRY_DSN=[Sentry DSN] /users/ods/bin/sentry-cli monitors run -e [Sentry ENV] ods-backups -- /var/opt/ops-data-store/venv/bin/ods-ctl backup now >> /users/ods/logs/cron/ods-backup-$(date +\%Y-\%m-\%d-\%H-\%M-\%S-\%Z).log 2>&1
0 0 * * * /usr/bin/find /users/ods/logs/cron/ -name 'ods-backup-*.log' -type f -mtime +30 -delete
Replace [Sentry DSN]
, [Sentry ENV]
with secret and per-instance/environment label (e.g. rothera-production
).
Create a Sentry cron monitor within the relevant Sentry subscription:
- project: ops-data-store
- name/slug:
ods-conversion
- schedule type: cron
- cron pattern:
*/5 * * * *
- cron timezone: UTC
- grace period:
5
minutes - max runtime:
4
minutes - notify:
#magic
- failure tolerance:
1
- recovery tolerance:
1
- environment: All Environments
Once created, edit the monitor's associated alert to set:
- conditions: post message to
#dev
channel in MAGIC Slack (in addition to regular team notification) - action interval: 5 minutes
- alert owner:
#magic
Create directory for cron backup logs:
$ mkdir -p ~/logs/cron/
Add a new cron job crontab -e
:
SHELL=/bin/bash
[email protected]
## Operations Data Store automated conversion - https://gitlab.data.bas.ac.uk/MAGIC/ops-data-store#automatic-conversion
*/5 * * * * SENTRY_DSN=[Sentry DSN] /users/ods/bin/sentry-cli monitors run -e [Sentry ENV] ods-conversion -- /var/opt/ops-data-store/venv/bin/ods-ctl data convert >> /users/ods/logs/cron/ods-conversion-$(date +\%Y-\%m-\%d-\%H-\%M-\%S-\%Z).log 2>&1
0 0 * * * /usr/bin/find /users/ods/logs/cron/ -name 'ods-conversion-*.log' -type f -mtime +0 -delete
Replace [Sentry DSN]
, [Sentry ENV]
with secret and per-instance/environment label (e.g. rothera-production
).
Note: This section is a work in progress and may be restructured.
- save an ad-hoc layer style in QGIS to create the QGIS
public.layer_styles
table in the new instance - copy existing QGIS layer styles from an existing instance to the new instance
Note: This section is a work in progress and may be restructured.
- check test data can be created using the application user in QGIS
- suggested to add a test set of waypoints and route for verifying Air Unit conversion is working
- check test data can be created using an LDAP user
- check the web server can be accessed using an LDAP login
- check the Air Unit conversion process is working (requires some test waypoints/routes and the webserver)
- verify the back-up Sentry conversion monitor is running and ok for the new instance
- verify the Air Unit conversion Sentry monitor is running and ok for the new instance
To upgrade the Python application, upgrade the Python package version using Pip:
# if installed in a virtual environment
$ source /path/to/venv/bin/activate
$ python -m pip install --upgrade ops-data-store --extra-index-url https://public-access:[email protected]/api/v4/projects/1134/packages/pypi/simple
Check the version is as expected:
$ ods-ctl --version
0.2.0
Review Configuration options are set correctly and check the application configuration is still valid:
$ ods-ctl config check
Ok. Configuration valid.
Connection details for any resources created should be stored in the MAGIC 1Password shared vault.
- create app registration with the required permissions
- request NERC DDaT to approve required permissions
- optionally configure branding name, logo and internal note referencing this README
- create project with package registry and CI/CD enabled
- create deployment token for allowing anyone to install packages:
- name: "Public Access"
- username: "public-access"
- scopes: read_package_registry
- request an application server for running Python applications
- where file quotas are enforced for home directories, request 10GB (see MAGIC/ops-data-store#183 π‘οΈ for context)
- request a Postgres database with required extensions
- request a Windows VM (configured as a BAS workstation) with QGIS LTS installed to act as a reference VM
- request an LDAP entity to use for managing application LDAP groups
- request LDAP groups as needed for implementing application permissions
- request LDAP users for application, specifically:
- request SAN/data volume mounted in the application server with permissions for the Python app OS user to read/write
- request a user synchronisation mechanism between LDAP servers and between LDAP and Postgres
- request example users for each database role are created and documented in 1Password.
- request relevant grants be applied to allow Postgres database owner to grant permissions to objects it contains
There are several instances and environments for this project:
Environments:
- Development - for developing the Data Store (see Development section)
- Staging - for pre-release testing and experimentation.
- Production - for real use.
Instances:
- Cambridge (Production) - managed by BAS IT as general infrastructure
- Rothera (Staging) - managed by BAS IT in the wider SDA/stations staging infrastructure
- Rothera (Production) - managed by BAS IT in the on-station infrastructure
The infrastructure needed for each instance can be summarised by this diagram:
Some of the infrastructure used by this project is environment or instance specific. Other elements are shared across all instances/environments as this diagram:
- Cambridge (Production) π
- see MAGIC/ops-data-store#170 π‘ for initial setup
- Rothera (Staging) π
- see MAGIC/ops-data-store#39 π‘ for initial setup
- Rothera (Production) π
- see MAGIC/ops-data-store#40 π‘ for initial setup
- Cambridge (Production) π
- see MAGIC/ops-data-store#170 π‘ for initial setup
- Rothera (Staging) π
- see MAGIC/ops-data-store#39 π‘ for initial setup
- Rothera (Production) π
- see MAGIC/ops-data-store#40 π‘ for initial setup
ods_app_eo_acq_script
:- Rothera (Staging) π
- see MAGIC/ops-data-store#218 π‘οΈ for initial setup
- Rothera (Staging) π
Used across all instances/environments. See MAGIC/operations/field-operations-gis-data#50 π‘ for initial setup with UKRI.
Used to simulate an end user computer used by Operations, acting as a known working example for debugging and testing. Currently configured to use the Rothera (Staging) platform instance.
Symptoms:
An error occurs with the GeoPackage backup made when running the backup now
command.
Prerequisites:
- access to the relevant instance
- GDAL binaries (
ogr2ogr
) - connection details for the relevant database
Method:
- in the relevant instance, create an export directly using the
data export
command - if successful, check the state of the backup set
(path available using the
config show
command)- remove
controlled_datasets_backup.gpkg
if it exists
- remove
- if unsuccessful, use a test script, similar to the one attached in [1], to test the GDAL Python bindings in isolation
- if unsuccessful, use the GDAL CLI tools to export to a GeoPackage manually
- try exporting layers into individual files first [2]
- then if successful, export all layers into the same file [3]
Verify any data using QGIS.
Context:
References:
[1] https://gitlab.data.bas.ac.uk/MAGIC/ops-data-store/-/issues/170#note_112581
[2]
$ ogr2ogr -f GPKG test.gpkg PG:"host=[hostname] dbname=[database] user=[username] password=[password]" "[table]"
E.g.:
$ ogr2ogr -f GPKG test.gpkg PG:"host=db.example.com dbname=ops-data-store user=ops-data-store password=xxx" "depot"
[3]
E.g.:
$ ogr2ogr -f GPKG test2.gpkg PG:"host=db.example.com dbname=ops-data-store user=ops-data-store password=xxx" "depot"
$ ogr2ogr -f GPKG -append test2.gpkg PG:"host=db.example.com dbname=ops-data-store user=ops-data-store password=xxx" "instrument"
$ ogr2ogr -f GPKG -append test2.gpkg PG:"host=db.example.com dbname=ops-data-store user=ops-data-store password=xxx" "..."
$ ogr2ogr -f GPKG -append test2.gpkg PG:"host=db.example.com dbname=ops-data-store user=ops-data-store password=xxx" "layer_styles"
$ ldapsearch -x -H [Server] -D [Bind DN] -W -b "[Base DN]" "(uid=[User])"
Where: [Base DN]
is [OU people],[Base DN]
, E.g.: "ou=people,dc=example,dc=com"
and [User]
is the user to find
(E.g. conwat
).
$ ldapsearch -x -H [Server] -D [Bind DN] -W -b "[Base DN]" member
Where: [Base DN]
is [Group name with prefix],[OU groups],[Base DN]
, E.g.: "cn=foo,ou=groups,dc=example,dc=com"
Remove all members of a group except one, as LDAP groups cannot be empty:
$ ldapmodify -x -H [Server] -D [Bind DN] -W -f modify_group.ldif
Where modify_group.ldif
is for example:
dn: cn=foo,ou=groups,dc=example,dc=com
changetype: modify
delete: member
member: uid=conwat,ou=users,dc=example,dc=com
...
dn: cn=foo,ou=groups,dc=example,dc=com
changetype: modify
delete: member
member: uid=...,ou=users,dc=example,dc=com
Check out project:
$ git clone https://gitlab.data.bas.ac.uk/MAGIC/ops-data-store.git
$ cd ops-data-store
Note: If you do not have access to the BAS GitLab instance, clone from GitHub as a read-only copy instead.
Poetry is used for managing the Python environment and dependencies.
pyenv (brew install pyenv
) is strongly recommended to ensure the Python version
matches deployed instances. E.g. where the deployed Python version is 3.9.18
:
$ pyenv install 3.9.18
$ pyenv local 3.9.18
$ poetry install
A local Postgres database (brew install postgis
) is recommended to test storing Datasets:
$ brew services start postgresql
$ psql -d postgres -c 'CREATE DATABASE "ops-data-store-dev";'
$ psql -d postgres -c 'COMMENT ON DATABASE "ops-data-store-dev" IS '\''Ops Data Store local development'\'';'
Note: This database name is a convention and used in SQL files that will be run later. If changed, all references will need amending locally.
It's strongly recommended to set required configuration options using a .env
file based off the
.example.env
file as a reference.
A .test.env
file MUST be created as per the Testing Configuration section.
In QGIS, create the public.layer_styles
table by creating an ad-hoc layer and saving its style to the database (the
created style and ad-hoc layer can be removed once the table is created).
Set up the database, create schemas and assign permissions for Controlled Datasets:
$ poetry run ods-ctl db run --input-path resources/db/roles.sql
$ poetry run ods-ctl db run --input-path tests/resources/db/users.sql
$ poetry run ods-ctl db setup
$ poetry run ods-ctl db run --input-path resources/db/schemas.sql
$ poetry run ods-ctl db run --input-path resources/db/datasets-controlled.sql
$ poetry run ods-ctl db run --input-path tests/resources/db/grants.sql
The QGIS profile used for testing needs downloading π‘ from the GitLab package registry (as it's too large to sensibly store in Git).
Once downloaded, extract and rename to ops-data-store
. Then copy to the relevant QGIS profile directory:
- macOS:
~/Library/Application\ Support/QGIS/QGIS3/profiles/
- Windows:
%APPDATA%\Roaming\QGIS\QGIS3\profiles/
$ poetry run ods-ctl [COMMAND] [ARGS]
All code changes should be:
- made using a merge request associated with an issue in the project's Issue Tracker π‘
- summarised in the Change log
For consistency is strongly recommended to configure your IDE or other editor to use the EditorConfig settings defined in .editorconfig
.
- except for tests, all Python code should be contained in the
ops_data_store
package. - use
Path.resolve()
if displaying or logging file/directory paths in Python - Python dependencies are managed with Poetry in
pyproject.toml
- configuration options should be defined in the common
Config
class and this README in alphabetical order - configuration options should be read using the relevant environs helper method
- use logging to record how actions progress, using the
app
logger (e.g.logger = logging.getLogger('app')
)
The Safety package is used to check dependencies against known vulnerabilities.
WARNING! As with all security tools, Safety is an aid for spotting common mistakes, not a guarantee of secure code. In particular this is using the free vulnerability database, which is updated less frequently than paid options.
Checks are run automatically in Continuous Integration. To check locally:
$ poetry run safety scan
The GDAL Python bindings are tied to the GDAL library version.
BAS IT managed servers currently use an older version of GDAL (3.4.3) which is required by this project. Installing this version in a development environment is often unsupported due to its age causing an incompatibility.
Running with the 3.4.3 Python bindings but a newer Library version will give errors such as:
src/ops_data_store/data.py:4: in <module>
from osgeo.gdal import (
.venv/lib/python3.9/site-packages/osgeo/__init__.py:45: in <module>
_gdal = swig_import_helper()
.venv/lib/python3.9/site-packages/osgeo/__init__.py:42: in swig_import_helper
return importlib.import_module('_gdal')
E ModuleNotFoundError: No module named '_gdal'
The correct fix for this problem is to upgrade the version of GDAL available on BAS IT managed servers. In the meantime a workaround can be used to locally upgrade the Python bindings to match a newer Library version but not commit the updated Poetry lock file.
This creates a problem when other dependencies are added or updated as the upgraded Python bindings package will also be included, and partial updates cannot be committed because the file includes a checksum. To work around this problem the following workflow can be used:
- make other changes as needed
- downgrade the GDAL dependency [1]
- commit the Poetry lock file
- upgrade the GDAL dependency (but do not commit change) [2]
[1]
$ poetry remove gdal
$ poetry add [email protected]
[2]
$ poetry remove gdal
$ poetry add gdal
(Adapt the second add gdal
command to the version of GDAL needed locally).
Ruff is configured to run Bandit, a static analysis tool for Python.
WARNING! As with all security tools, Bandit is an aid for spotting common mistakes, not a guarantee of secure code. In particular this tool can't check for issues that are only be detectable when running code.
Ruff is used to lint and format Python files. Specific checks and config options are
set in pyproject.toml
. Linting checks are run automatically in Continuous Integration.
To check locally:
$ poetry run ruff check src/ tests/
$ poetry run ruff format --check src/ tests/
To format files:
$ poetry run ruff format src/ tests/
If using a local Postgres database installed through homebrew (where @14
is the version installed):
- to manage the service:
brew services [command] postgresql@14
- to view logs:
/usr/local/var/log/[email protected]
- add new properties to
ops_data_store.config.Config
class - include new properties to
ops_data_store.config.Config.dump()
method - if relevant, update
ops_data_store.config.Config.validate()
method - if relevant, update
.env
files and templates - if relevant, update
.gitlab-ci.yml
variables - add fixture to
tests.conftest
- include fixture in
tests.conftest.fx_test_config_dict
fixture - update
tests.ops_data_store_tests.test_config
module
All 1st party Python code in the ops_data_store
package must be covered by tests, defined in
the ops_data_store_tests
package.
pytest
is used as the test framework, configured in pyproject.toml
.
Fixtures should be defined in conftest.py
, prefixed with fx_
to indicate they are a fixture,
e.g.:
import pytest
@pytest.fixture()
def fx_test_foo() -> str:
"""Example of a test fixture."""
return 'foo'
Test coverage is checked with pytest-cov
with an aim for 100% coverage
(with some exceptions). Exemptions for coverage should be used sparingly and only with good justification. Where tests
are added to ensure coverage, the cov
mark should be added, e.g:
import pytest
@pytest.mark.cov
def test_foo():
assert 'foo' == 'foo'
An additional .test.env
file is used to override some application config properties, such as the
database. This file can be created from the .example.test.env
reference file.
There is a known and major limitation with tests mocking interactions with the database leading to bugs such as MAGIC/ops-data-store#124 π‘οΈ.
See MAGIC/ops-data-store#159 π‘ for progress to address this.
Tests and coverage checks are run automatically in Continuous Integration. To check locally:
$ poetry run pytest --strict-markers --random-order --cov --cov-report=html tests
All commits will trigger Continuous Integration using GitLab's CI/CD platform, configured in .gitlab-ci.yml
.
The Docker Image used for CI is needed to ensure the correct version of the GDAL is used, which is then configured to use the correct Python version.
This project is distributed as a Python package installable from a private package registry provided by the BAS GitLab instance. It is built by Poetry automatically as part of Continuous Deployment. If needed it can also be built manually:
$ poetry build
If the QGIS profile used for testing needs updating, the generic package stored in GitLab can be updated:
$ curl --header "PRIVATE-TOKEN: $BAS_GITLAB_TOKEN" --upload-file resources/qgis/qgis-profile.zip https://gitlab.data.bas.ac.uk/api/v4/projects/1134/packages/generic/qgis-profile/[version]/qgis-profile.zip
Where [version]
is replaced with a calendar based version YYYY-MM-DD.N
, e.g. the first release on April 12th 2024
would become 2023-04-12.0
. A second release that day would be 2023-04-12.1
etc.
For example:
$ curl --header "PRIVATE-TOKEN: $BAS_GITLAB_TOKEN" --upload-file resources/qgis/qgis-profile.zip https://gitlab.data.bas.ac.uk/api/v4/projects/1134/packages/generic/qgis-profile/2023-04-12.0/qgis-profile.zip
Tagged commits will trigger Continuous Deployment using GitLab's CI/CD platform, configured in .gitlab-ci.yml
.
To create a release, create an issue using the release issue template and follow the included checklist.
This project is maintained by the BAS Mapping and Geographic Information Centre (MAGIC), contactable at: [email protected].
Copyright (c) 2023-2024 UK Research and Innovation (UKRI), British Antarctic Survey (BAS).
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.