You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It looks like release 2.38 refactored how ray.data.Datasink works and this broke our integration.
In version 2.37 and below each call to write was allowed to return some output:
A user-defined output. Can be anything, and the returned value is passed to on_write_complete().
However, in version 2.38, the signature for write changed to return None.
We were relying on this feature because, in each call to write, we create a "fragment" and return the fragment metadata (a small JSON string). In on_write_complete we would take these fragments and commit them as a single transaction, completing the write.
I suppose we will need to figure out some other way to store temporary state in version 2.38. Maybe we can we store it on the datasink itself? My hunch is that this will not work because the write calls will run on a different worker than the call to on_write_complete.
ref: ray-project/ray#49211
The text was updated successfully, but these errors were encountered: