update world_bank scraper #349

cjyetman · 2023-09-09T11:23:15Z

Caution: causes many changes/moves in the data CSV

cjyetman · 2023-09-09T12:35:26Z

Hi, jumping in here to point out that most of the changes are due to the fact that the dataset is now sorted by wb and not by country anymore. Isn't sorting by country again an easy way to reduce the number of changes here? Sorry if I missed sth obvious about this

Yes, but my goal here was to update the scraper with minimal changes, not to update the data (which is typically done all together in a separate process). Just pointing it out for @vincentarelbundock's awareness.

NilsEnevoldsen · 2023-09-09T14:02:03Z

It does make it hard to see what, if anything, changed as a result.

vincentarelbundock · 2023-09-09T14:42:39Z

Thanks all. I added one line of code with an arrange to make the diff easier to see. The changes look pretty minimal.

You can merge whenever you want.

vincentarelbundock · 2023-09-09T14:43:49Z

dictionary/data_world_bank.csv

 Croatia,HRV
 Cuba,CUB
 Curaçao,CUW
 Cyprus,CYP
 Czech Republic,CZE
+Côte d’Ivoire,CIV


Do we want curly ' ?

I think it is probably not proper, right? But that's what it is in the original data. Should I make the scraper "fix" it?

Yeah, I think so. My gut feeling is that converting to nice curly should be the typesetter's job, not a data-level thing.

🤔

packageVersion("countrycode") #> [1] '1.5.0' countrycode::countrycode("CIV", "iso3c", "country.name") #> [1] "Côte d’Ivoire"

It's hard to see the difference with GitHub's formatting, but both ASCII single quote and UTF curly quote work, and UTF curly quote is what's currently in countrycode::codelist, so I'm assuming leaving it as curly quote is ok, maybe even ideal.

library(countrycode) packageVersion("countrycode") #> [1] '1.6.0' as.hexmode(utf8ToInt("'")) #> [1] "27" as.hexmode(utf8ToInt("’")) #> [1] "2019" countrycode("Côte d\U27Ivoire", "country.name", "country.name") #> [1] "Côte d’Ivoire" countrycode("Côte d\U2019Ivoire", "country.name", "country.name") #> [1] "Côte d’Ivoire" stringi::stri_escape_unicode(countrycode("CI", "iso2c", "country.name")) #> [1] "C\\u00f4te d\\u2019Ivoire"

cjyetman · 2023-09-09T15:55:59Z

@vincentarelbundock I was under the impression that all of these "getter" scripts got run together in a Docker during some part of the process you do, so I included the changed CSV just to facilitate reviewing the consequences of the changes I made, but with the intention of removing the changed CSV before merging. Looking a bit deeper at things now, it looks like maybe that doesn't happen anymore or the process has changed? Should I included the modified CSV as well, if it changes, when making a change to a getter script?

remove no longer relevant comment

vincentarelbundock · 2023-09-09T18:54:28Z

@vincentarelbundock I was under the impression that all of these "getter" scripts got run together in a Docker during some part of the process you do, so I included the changed CSV just to facilitate reviewing the consequences of the changes I made, but with the intention of removing the changed CSV before merging. Looking a bit deeper at things now, it looks like maybe that doesn't happen anymore or the process has changed? Should I included the modified CSV as well, if it changes, when making a change to a getter script?

Yeah, it's been a while, but if I remember correctly, my previous setup was ridiculously over-(and badly-) engineered. So I simplified everything. get_*() saves a CSV file that we keep in the repo. Then, build.R reads and merges all the CSV files.

I always run it on my local machine; never on Docker. Maybe I should do that, but things have worked ok thus far...

update world_bank scraper

dba113f

cjyetman requested a review from vincentarelbundock September 9, 2023 11:23

This comment was marked as off-topic.

Sign in to view

sort order to facilitate diff

0b18788

vincentarelbundock approved these changes Sep 9, 2023

View reviewed changes

vincentarelbundock reviewed Sep 9, 2023

View reviewed changes

Update get_world_bank.R

e2dd246

remove no longer relevant comment

cjyetman marked this pull request as ready for review September 27, 2024 16:55

cjyetman merged commit ecf0013 into main Sep 27, 2024
6 checks passed

cjyetman deleted the update-world_bank_scraper branch September 27, 2024 16:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

update world_bank scraper #349

update world_bank scraper #349

cjyetman commented Sep 9, 2023

This comment was marked as off-topic.

cjyetman commented Sep 9, 2023

NilsEnevoldsen commented Sep 9, 2023

vincentarelbundock commented Sep 9, 2023

vincentarelbundock Sep 9, 2023

cjyetman Sep 9, 2023

vincentarelbundock Sep 9, 2023

cjyetman Sep 9, 2023

cjyetman Sep 27, 2024

cjyetman commented Sep 9, 2023

vincentarelbundock commented Sep 9, 2023

update world_bank scraper #349

update world_bank scraper #349

Conversation

cjyetman commented Sep 9, 2023

This comment was marked as off-topic.

cjyetman commented Sep 9, 2023

NilsEnevoldsen commented Sep 9, 2023

vincentarelbundock commented Sep 9, 2023

vincentarelbundock Sep 9, 2023

Choose a reason for hiding this comment

cjyetman Sep 9, 2023

Choose a reason for hiding this comment

vincentarelbundock Sep 9, 2023

Choose a reason for hiding this comment

cjyetman Sep 9, 2023

Choose a reason for hiding this comment

cjyetman Sep 27, 2024

Choose a reason for hiding this comment

cjyetman commented Sep 9, 2023

vincentarelbundock commented Sep 9, 2023