The following full-stack has been implemented and is available in this repository:
The full report for this project is available here (Document in French).
The project was done by the Data Semantics Lab of the HES-SO Valais/Wallis - Institute of Informatics - Sierre
Find here under explanations about:
- Original data and data transformation
- Data validation
- Data stored in two triple stores and client interrogation on the SPARQL endpoints
The proxy to secure the data access is described and available in this repository
Fictive datasets were created to simulate UPI and EWR informations.
The choosen data model are the SEMIC core vocabularies:
- Core Person Vocabulary for the UPI dataset of people's basic informations
- Core Criterion and Core Evidence Vocabulary for the EWR dataset of people's principal residences
- The UPI dataset creation based on the tarql tool
Tarql can be run on Windows or Linux withrun.bat
andrun.sh
respectively (found here) - The EWR dataset creation based on tarql
Tarql can be run on Windows or Linux withrun.bat
andrun.sh
respectively (found here)
The XML2RDF folder contains example of tools to demonstrate how XML can be transformed to RDF:
- The first example is based on sparql-generate
- The second example is based on rocketRML
See the documentation of the tool about its installation that might require node.js and npm.
Both examples contain a run.sh
to transform the file persons-eCH0044.xml. It is the same XML file in both examples, that contains 5 fictional caracters as a complement to the UPI dataset generated here above.
More information will be given about those tools soon.
RDF data can be validated with the W3C standard Shapes Constraint Language (SHACL).
The generated UPI and EWR datasets, presented here above, are based and the SEMIC ontologies and can be validated with their provided SHACL files:
- UPI dataset based on the Core Person Vocabulary, find the SHACL file here.
- EWR dataset based on the Core Criterion and Core Evidence Vocabulary, find the SHACL file here.
To run the SHACL validation we use the Apache Jena implementation (Apache Jena Commands, version 5.1.0).
The shacl folder contains the necessary tools and files to perform the SHACL validation:
- The Jena tool unzipped in the
apache-jena-5.1.0
sub-folder - The two generated datasets
UPI_Personnes_fiction.ttl
andEWR_ResidencesPrincipales.ttl
- The SHACL files
core-person-ap-SHACL.ttl
andcccev-ap-SHACL.ttl
Note: thecccev-ap-SHACL.ttl
was adapted tocccev-ap-SHACL_corrected.ttl
to avoid mixing http and https URLs for the time ontology runUPI.sh
andrunEWR.sh
to execute the SHACL validation
The Python POC with SPARQL queries to two local SPARQL endpoints (EWR and UPI) that can be easily launched locally
The Python code relies on rdflib (Code and Documentation)
The POC was run on Ubuntu:
-
Launch the EWR SPARQL endpoint with startEWR.sh
The endpoint is published onhttp://localhost:8000/
-
Launch the UPI SPARQL endpoint with startUPI.sh
The endpoint is published onhttp://localhost:8001/
-
Run the client code with 3 parameters
python3 serafe_sparql_query.py --queryNumber 5 --ewr_endpoint http://localhost:8000/ --upi_endpoint=http://localhost:8001/
Use the parameter queryNumber to choose which query to run:
1 Federated SPARQL
2 Two queries
3 Multiple queries
4 Wikidata dereferencing
5 AllTo display the information about the expected parameters:
python3 serafe_sparql_query.py
Remark: The client code is sending SPARQL queries to the SPARQL end-points passed in parameters, this POC would thus work with any SPARQL end-point (and any triple store) that host the data