Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: clean up wof dictionary generation code #132

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

missinglink
Copy link
Member

@missinglink missinglink commented Feb 23, 2021

I attempted to update the WOF resources today but lost enthusiasm half way through due to a bunch of different changes popping up.

This PR makes subsequent attempts at updating WOF considerably easier:

  • update resources/whosonfirst/generate.js to sort dictionaries before writing them to disk (which makes diffs muuuch easier to read), also added some comments about potential gotchas
  • sort existing dictionaries using the same sorting method (using the monstrous command below)
find resources/whosonfirst/dictionaries -type f -name '*.txt' \
  | node -e 'const fs=require(`fs`); fs.readFileSync(0, `utf-8`).trim().split(`\n`).forEach(file => fs.writeFileSync(file, fs.readFileSync(file, `utf-8`).trim().split(`\n`).sort().join(`\n`)))'

this should be a no-op, it's only sorting existing dictionaries, not adding or removing from them.

@missinglink missinglink requested a review from Joxit February 23, 2021 07:39
@missinglink
Copy link
Member Author

Screenshot 2021-02-23 at 20 48 25

😆

Copy link
Member

@Joxit Joxit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

@Joxit
Copy link
Member

Joxit commented Feb 23, 2021

Should we also normalize names here ? I found some name with uppercase and accents like Épinay even if LOWER is used in the SQLites statement (thank you French localities 😅).

grep  'Épinay' resources/whosonfirst/dictionaries/locality/name\:fra_x_preferred.txt 

Second thought : Adding normalization will not improve diff reading, so I will merge as-is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants