Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confirmed cases for 长春市 and 吉林市 were flipped #110

Closed
beansrowning opened this issue Mar 25, 2022 · 7 comments
Closed

Confirmed cases for 长春市 and 吉林市 were flipped #110

beansrowning opened this issue Mar 25, 2022 · 7 comments

Comments

@beansrowning
Copy link

This doesn't seem like something you can fix on your end, but I wanted to bring to your/others awareness that I noticed the values of confirmed cases reported in Changchun and Jilin City were flipped on 3/15, which has thrown off cumulative cases since.

Per NHC on 3/15 (http://www.nhc.gov.cn/xcs/yqtb/202203/8d8d2035b3884fcfb734e0ab07bede79.shtml):

Changchun: +460
Jilin City: +2601

Per DXY scrape:
image

Also, just a thank you for your continued support of this service. I started using these data early in 2020, and they've continued to be valuable some two years later. I've used them to provide situational awareness to US Government and colleagues internationally. I've found no better publicly available source of ADM2-level data to date.

Cheers!

@qianxliu
Copy link

qianxliu commented Mar 30, 2022

Surely John Hopkins data about China Covid is definably inaccurate
As a Chinese, I'm also working for a valuable data without need for manual production
But I'm afraid that I may build it by hand.

@qianxliu
Copy link

I guess John Hopkins data totally different with real China covid data is a big reason for wrong assessment of China's novel coronavirus pneumonia.

@beansrowning
Copy link
Author

JHU works well at provincial level (ADM 1) but there is no Chinese prefecture/city level data (ADM 2) that I'm aware of.

@qianxliu
Copy link

qianxliu commented Mar 30, 2022

I compare JHU with official post already, and they are totally mismatched.
and I use JHU data to analyst, find it very weird in modeling China COVID19.
data source: https://github.com/CSSEGISandData/COVID-19

about 1200 in recent days, but as I know, the data is totally different.
https://voice.baidu.com/act/newpneumonia/newpneumonia/?from=osari_aladin_banner

I still couldn't know how to use this data repos, but may you can try to do a comparison

@qianxliu
Copy link

If want to get Chinese data must know Chinese and spend time to arrange it.
Chinese scholars generally do not do these things.

@qianxliu
Copy link

qianxliu commented Apr 1, 2022

@beansrowning
I have found JHU data is basically correct.
my error data comes from this:
https://github.com/Kamaropoulos/COVID19Py
which data sources comes from JHU but has a bad implementation.
This package has a terrible bug which needs to be fixed.

Finally solved the case

@BlankerL
Copy link
Owner

Hi @beansrowning,

Thanks a lot for your report! I have checked your NHC post on 3/15, the database and the dumped CSV/JSON file, and you are right. The values of Changchun and Jilin City on the timestamp 1647311180134 (Tue Mar 15, 2022, 02:26:20 GMT+0000 or Tue Mar 15, 2022, 10:26:20 GMT+0800) are flipped.

These data are directly scraped from DXY, which, as far as I know, manually collects and updates on their site and might have some wrong values, especially in the early days.

The purpose of the crawler and this data warehouse is to maintain a time-series database, so I prefer not to modify the data directly in the database. But I will address this problem in the Noise Data part on the README file for others' awareness. Thanks again for your contribution.

Also, thank you for letting me know the dataset really helped in your research. Being helpful in scientific research is my main purpose in building and maintaining this dataset.

Cheers!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants