-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: numpy.int64 types are not serialized correctly #33020
Comments
Here's an another case which is clearly broken:
|
it's the same serialization problem, numpy types as a whole don't seem to get handled by the coders correctly and wind up being clobbered to their base vale |
Tested this with different conditions.
Test code: import apache_beam as beam
import numpy as np
with beam.Pipeline() as pipeline:
indata = pipeline | "Create" >> beam.Create([(a, int(a)) for a in np.arange(3)])
# Apply CombinePerkey to sum values for each key.
outdata = indata | "CombinePerKey" >> beam.CombinePerKey(sum) | beam.Map(print) Run this with Python 3.10 and Beam is able to generate the expected error: WARNING:apache_beam.coders.coder_impl:Using fallback deterministic coder for type '<class 'numpy.int64'>' in 'Create/MaybeReshuffle/Reshuffle/ReshufflePerKey/GroupByKey'.
ERROR:apache_beam.runners.common:Unable to deterministically encode '0' of type '<class 'numpy.int64'>', please provide a type hint for the input of 'Create/MaybeReshuffle/Reshuffle/ReshufflePerKey/GroupByKey' [while running 'Create/Map(decode)']
Traceback (most recent call last):
File "apache_beam/runners/common.py", line 1501, in apache_beam.runners.common.DoFnRunner.process
File "apache_beam/runners/common.py", line 689, in apache_beam.runners.common.SimpleInvoker.invoke_process
File "apache_beam/runners/common.py", line 1687, in apache_beam.runners.common._OutputHandler.handle_process_outputs
File "apache_beam/runners/common.py", line 1800, in apache_beam.runners.common._OutputHandler._write_value_to_tag
File "apache_beam/runners/worker/operations.py", line 262, in apache_beam.runners.worker.operations.SingletonElementConsumerSet.receive
File "apache_beam/runners/worker/operations.py", line 205, in apache_beam.runners.worker.operations.ConsumerSet.update_counters_start
File "apache_beam/runners/worker/opcounters.py", line 210, in apache_beam.runners.worker.opcounters.OperationCounters.update_from
File "apache_beam/runners/worker/opcounters.py", line 262, in apache_beam.runners.worker.opcounters.OperationCounters.do_sample
File "apache_beam/coders/coder_impl.py", line 1493, in apache_beam.coders.coder_impl.WindowedValueCoderImpl.get_estimated_size_and_observables
File "apache_beam/coders/coder_impl.py", line 1504, in apache_beam.coders.coder_impl.WindowedValueCoderImpl.get_estimated_size_and_observables
File "apache_beam/coders/coder_impl.py", line 1053, in apache_beam.coders.coder_impl.AbstractComponentCoderImpl.get_estimated_size_and_observables
File "apache_beam/coders/coder_impl.py", line 377, in apache_beam.coders.coder_impl.FastPrimitivesCoderImpl.get_estimated_size_and_observables
File "apache_beam/coders/coder_impl.py", line 457, in apache_beam.coders.coder_impl.FastPrimitivesCoderImpl.encode_to_stream
File "apache_beam/coders/coder_impl.py", line 518, in apache_beam.coders.coder_impl.FastPrimitivesCoderImpl.encode_special_deterministic
TypeError: Unable to deterministically encode '0' of type '<class 'numpy.int64'>', please provide a type hint for the input of 'Create/MaybeReshuffle/Reshuffle/ReshufflePerKey/GroupByKey' With Python 3.11 and Python 3.12, Beam creates the warnings but does not stop running the code: WARNING:apache_beam.coders.coder_impl:Using fallback deterministic coder for type '<class 'numpy.int64'>' in 'Create/MaybeReshuffle/Reshuffle/ReshufflePerKey/GroupByKey'.
WARNING:apache_beam.coders.coder_impl:Using fallback deterministic coder for type '<class 'numpy.int64'>' in 'Create/MaybeReshuffle/Reshuffle/ReshufflePerKey/GroupByKey'.
(0, 3) |
More notes: Looks like
|
For numpy,
|
What happened?
Relevant repro of the problem:
Relevant error:
The problem appears to be that the coder does not know how to handle the numpy int64 type and the fallback coder (PickleCoder, I believe) cannot encode the type deterministically so it clobbers the content inside the class to its base value of 0.
Issue Priority
Priority: 2 (default / most bugs should be filed as P2)
Issue Components
The text was updated successfully, but these errors were encountered: