Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hash mismatch for wine_reviews_130k_varietals_75 #40

Open
hamelin opened this issue Sep 30, 2021 · 4 comments
Open

Hash mismatch for wine_reviews_130k_varietals_75 #40

hamelin opened this issue Sep 30, 2021 · 4 comments

Comments

@hamelin
Copy link
Contributor

hamelin commented Sep 30, 2021

In notebook 03, when loading dataset wine_reviews_130k_varietals_75, I get a data hash mismatch. The intended SHA-1 is 52ea2825926ce21c8641109acdd6f889587d9c36, but the SHA-1 that is computed is 8b234d7595929d589c1a6781730fcb5b75e351e2.

It is easy to monkeypatch to work around, but I wonder if this plays along the same lines as the other hash mismatch issues previously encountered.

@hamelin
Copy link
Contributor Author

hamelin commented Sep 30, 2021

Platform is Windows 10, 64-bit CPU.

@hamelin
Copy link
Contributor Author

hamelin commented Sep 30, 2021

Debug dump is attached.

@acwooding
Copy link
Owner

Thanks. Similar to the hash in issue #28. Not sure yet if it's another version of the pandas+joblib hash issue or if additionally, we're missing a "sort".

@hackalog
Copy link
Contributor

Easydata issue: hackalog/easydata#231
To reproduce, try pinning pandas version == 1.0.5 and version == 1.3.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants