Skip to content

4. Update

DΓ©nes Csala edited this page Nov 10, 2024 · 1 revision

Structure

The data update process is governed through Jupyter notebooks.

  • The formatter daily updates the daily data. This is set up to run daily. It has been automated.

    • Welcome dashboard update in through the Grafana backend
    • News and opportunities
    • Legal map

    Furthermore, an automated push to the data repository is also made by @gembott.

  • The longterm updates everything else. This is triggered manually but it is set up to run at least weekly. All updates are configured to take place in the InfluxDB backend. These are immediately reflected in the Grafana frontend. The Welcome dashboard udpate is processed through the Grafana backend. This stage does not include an automated data repository push, as this is picked up the next day at the latest by the daily updates. The formatter data update includes (on top of the daily updates):

    • APS data
    • NES data
    • Maps
      • APS
      • NES
      • RO regional
    • RO stats
      • Data & Maps
    • Past GEM reports
    • Legal changes
    • NES and RO stats Radar charts
    • APS Scatter plot
    • Executive summary

Data push

Throughout GEM, we use @gembott for automated data pushes to the data repository. We chose to update only the data subfolder of the project repository, since the rest of the code requires more curation from our side. This requires you to set up a "partial git sync", with oly that particular subfolder. In technical terms, this is called a sparse-checkout. These are the steps for setting it up:

  • Create a subfolder as the home directory for your target GitHub repo

    mkdir incidence
    cd incidence

  • Initialize a repository here

    git init
    git remote add -f origin https://github.com/denesdata/gem

  • Set up sparse-checkout and configure it for your data folder

    git config core.sparseCheckout true
    echo "your/data/folder/path/from/origin" >> .git/info/sparse-checkout

  • Pull! If you set it up correctly, this will only populate the data folder.

    git pull origin master

  • Then, any time you generate new data, just send it to the data folder through Jupyter and push it up to the GitHub data repository.

    git add --all
    git commit --all -m "automated incidence data update 2021-05-01"
    git push origin master

  • if you would like to avoid the GitHub authentication prompt every commit, it might help to set up a credential cache. (The last parameter is the timeout for the cache, in seconds)

    git config --global credential.helper "cache --timeout=2592000"

πŸ€“ AWESOME!

You're a real GEM GEEK now! You may, however, want to read further about the structure of the πŸ‘‰ 5. Data.

Clone this wiki locally