You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current way we handle data hashing doesn't survive package upgrades. For example, with pandas, we have been dumping dataframes and the hashes change (even if the data itself doesn't) with upgrades to pandas.
The text was updated successfully, but these errors were encountered:
The risk, (which is the reason, I assume, it was not done this way already) is that the pickle memoization process will interfere will hashing and create spurious changes in pickle string of dtypes with the final consequence of assigning different hash values for seemingly identical objects
I think there's a really deep issue here, and that's that in order to be truly reproducible here, we need a hash that's more aware of the data, as certain data formats will change version-to-version even through the underlying raw data is identical.
The current way we handle data hashing doesn't survive package upgrades. For example, with pandas, we have been dumping dataframes and the hashes change (even if the data itself doesn't) with upgrades to pandas.
The text was updated successfully, but these errors were encountered: