Skip to content

swiss/ld-prototype-data

Repository files navigation

Linked Data Prototype - Data

The following full-stack has been implemented and is available in this repository:
Full stack

The full report for this project is available here (Document in French).
The project was done by the Data Semantics Lab of the HES-SO Valais/Wallis - Institute of Informatics - Sierre

Find here under explanations about:

The proxy to secure the data access is described and available in this repository

Data transformation

Fictive datasets were created to simulate UPI and EWR informations.

The choosen data model are the SEMIC core vocabularies:

UPI and EWR data - CSV to RDF transformation

UPI data - XML to RDF transformation

The XML2RDF folder contains example of tools to demonstrate how XML can be transformed to RDF:

Both examples contain a run.sh to transform the file persons-eCH0044.xml. It is the same XML file in both examples, that contains 5 fictional caracters as a complement to the UPI dataset generated here above.
More information will be given about those tools soon.

Data validation with SHACL

RDF data can be validated with the W3C standard Shapes Constraint Language (SHACL).

The generated UPI and EWR datasets, presented here above, are based and the SEMIC ontologies and can be validated with their provided SHACL files:

To run the SHACL validation we use the Apache Jena implementation (Apache Jena Commands, version 5.1.0).

The shacl folder contains the necessary tools and files to perform the SHACL validation:

  • The Jena tool unzipped in the apache-jena-5.1.0 sub-folder
  • The two generated datasets UPI_Personnes_fiction.ttl and EWR_ResidencesPrincipales.ttl
  • The SHACL files core-person-ap-SHACL.ttl and cccev-ap-SHACL.ttl
    Note: the cccev-ap-SHACL.ttl was adapted to cccev-ap-SHACL_corrected.ttl to avoid mixing http and https URLs for the time ontology
  • runUPI.sh and runEWR.sh to execute the SHACL validation

Data stored in two triple stores and client interrogation on the SPARQL endpoints

The Python POC with SPARQL queries to two local SPARQL endpoints (EWR and UPI) that can be easily launched locally
The Python code relies on rdflib (Code and Documentation)

The POC was run on Ubuntu:

  • Launch the EWR SPARQL endpoint with startEWR.sh
    The endpoint is published on http://localhost:8000/

  • Launch the UPI SPARQL endpoint with startUPI.sh
    The endpoint is published on http://localhost:8001/

  • Run the client code with 3 parameters

    python3 serafe_sparql_query.py --queryNumber 5 --ewr_endpoint http://localhost:8000/ --upi_endpoint=http://localhost:8001/
    

    Use the parameter queryNumber to choose which query to run:
    1 Federated SPARQL
    2 Two queries
    3 Multiple queries
    4 Wikidata dereferencing
    5 All

    To display the information about the expected parameters:

    python3 serafe_sparql_query.py
    

Remark: The client code is sending SPARQL queries to the SPARQL end-points passed in parameters, this POC would thus work with any SPARQL end-point (and any triple store) that host the data

About

Repo for dataschemes, testdata, and so on...

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published