Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[distributed_dp] Including package versions into the requirements file #57

Open
fraboeni opened this issue Mar 8, 2022 · 12 comments
Open
Assignees

Comments

@fraboeni
Copy link

fraboeni commented Mar 8, 2022

Hi everyone,

First on all, thank you very much for providing the very nice distributed_dp package.

I was trying to get it to work, and installed the packages referenced in https://github.com/google-research/federated/blob/master/distributed_dp/requirements.txt. Unfortunately, even though I installed the nightly build versions of all the packages as indicated in the README, there seem to be compatibility issues.

I've tried a couple of different combinations of versions for tf, tf-federated, tf-privacy, tf-estimator, but the code was running in none of them.

My current setup is

...
python                    3.9.7                h12debd9_1
keras-nightly             2.9.0.dev2022030808          pypi_0    pypi
tb-nightly                2.9.0a20220307           pypi_0    pypi
tensorboard               2.8.0                    pypi_0    pypi
tensorboard-data-server   0.6.1                    pypi_0    pypi
tensorboard-plugin-wit    1.6.0                      py_0
tensorflow-datasets       4.5.2                    pypi_0    pypi
tensorflow-federated-nightly 0.19.0.dev20220218          pypi_0    pypi
tensorflow-io-gcs-filesystem 0.24.0                   pypi_0    pypi
tensorflow-metadata       1.7.0                    pypi_0    pypi
tensorflow-model-optimization 0.7.1                    pypi_0    pypi
tensorflow-privacy        0.7.3                    pypi_0    pypi
tensorflow-probability    0.15.0                   pypi_0    pypi
tf-estimator-nightly      2.9.0.dev2022030809          pypi_0    pypi
tf-nightly                2.9.0.dev20220308          pypi_0    pypi
... 

In this setup, I get the error

Traceback (most recent call last):
  File "/home/fraboeni/.cache/bazel/_bazel_fraboeni/eb0df9f25fbadff22165e0e943d33a0f/execroot/org_federated_research/bazel-out/k8-opt/bin/distributed_dp/fl_run.runfiles/org_federated_research/distributed_dp/fl_run.py", line 28, in <module>
    from distributed_dp import fl_utils
  File "/home/fraboeni/.cache/bazel/_bazel_fraboeni/eb0df9f25fbadff22165e0e943d33a0f/execroot/org_federated_research/bazel-out/k8-opt/bin/distributed_dp/fl_run.runfiles/org_federated_research/distributed_dp/fl_utils.py", line 22, in <module>
    from distributed_dp import accounting_utils
  File "/home/fraboeni/.cache/bazel/_bazel_fraboeni/eb0df9f25fbadff22165e0e943d33a0f/execroot/org_federated_research/bazel-out/k8-opt/bin/distributed_dp/fl_run.runfiles/org_federated_research/distributed_dp/accounting_utils.py", line 21, in <module>
    import tensorflow_privacy as tfp
  File "/home/fraboeni/.conda/envs/tf-federated/lib/python3.9/site-packages/tensorflow_privacy/__init__.py", line 30, in <module>
    from tensorflow_privacy import v1
  File "/home/fraboeni/.conda/envs/tf-federated/lib/python3.9/site-packages/tensorflow_privacy/v1/__init__.py", line 32, in <module>
    from tensorflow_privacy.privacy.estimators.v1.dnn import DNNClassifier as DNNClassifierV1
  File "/home/fraboeni/.conda/envs/tf-federated/lib/python3.9/site-packages/tensorflow_privacy/privacy/estimators/v1/dnn.py", line 19, in <module>
    from tensorflow_privacy.privacy.estimators.v1 import head as head_lib
  File "/home/fraboeni/.conda/envs/tf-federated/lib/python3.9/site-packages/tensorflow_privacy/privacy/estimators/v1/head.py", line 22, in <module>
    from tensorflow.python.ops import lookup_ops  # pylint: disable=g-direct-tensorflow-import
ImportError: cannot import name 'lookup_ops' from 'tensorflow.python.ops' (unknown location)

when running bazel run :fl_run

My question now is the following: could you share version numbers in your requirement.txt file for which the code is successfully running?

@kenziyuliu
Copy link
Contributor

kenziyuliu commented Mar 9, 2022

Hi @fraboeni,

Thanks for your interest! I just tried locally cloning the repo and starting a new conda environment, and I was able to get it running using the following commands:

conda create -n tff python=3.9
conda activate tff
pip install -r requirements.txt   # inside `distributed_dp/`
pip install tensorflow-addons
bazel run :fl_run  # the example command for EMNIST

The specific versions of the related packages:

...
python                    3.9.7                h88f2d9e_1
tensorboard               2.8.0                    pypi_0    pypi
tensorboard-data-server   0.6.1                    pypi_0    pypi
tensorboard-plugin-wit    1.8.1                    pypi_0    pypi
tensorflow                2.8.0                    pypi_0    pypi
tensorflow-addons         0.16.1                   pypi_0    pypi
tensorflow-datasets       4.5.2                    pypi_0    pypi
tensorflow-estimator      2.8.0                    pypi_0    pypi
tensorflow-federated      0.20.0                   pypi_0    pypi
tensorflow-io-gcs-filesystem 0.24.0                   pypi_0    pypi
tensorflow-metadata       1.7.0                    pypi_0    pypi
tensorflow-model-optimization 0.7.1                    pypi_0    pypi
tensorflow-privacy        0.7.3                    pypi_0    pypi
tensorflow-probability    0.16.0                   pypi_0    pypi
tf-estimator-nightly      2.8.0.dev2021122109          pypi_0    pypi
...

It seems that nightly builds are not needed but you would need tensorflow-addons which was not specified in requirements.txt. Could you try and see if the above works?

@fraboeni
Copy link
Author

Thank you so much for your help with that @kenziyuliu.
The installation worked just fine.

Now, I am running into different errors: I ran bazel run :fl_run and got

INFO: Analyzed target //distributed_dp:fl_run (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
Target //distributed_dp:fl_run up-to-date:
  bazel-bin/distributed_dp/fl_run
INFO: Elapsed time: 0.120s, Critical Path: 0.00s
INFO: 1 process: 1 internal.
INFO: Build completed successfully, 1 total action
INFO: Build completed successfully, 1 total action
E0310 17:25:35.563270 139699288257280 optimizer_utils.py:264] Unknown optimizer [None], known optimziers are [['sgd', 'adagrad', 'adam', 'yogi', 'lars', 'lamb', 'shampoo']]. To add support for an optimizer, add the optimzier class to the utils_impl._SUPPORTED_OPTIMIZERS list.
Traceback (most recent call last):
  File "/home/fraboeni/.cache/bazel/_bazel_fraboeni/eb0df9f25fbadff22165e0e943d33a0f/execroot/org_federated_research/bazel-out/k8-opt/bin/distributed_dp/fl_run.runfiles/org_federated_research/distributed_dp/fl_run.py", line 290, in <module>
    app.run(main)
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/absl/app.py", line 312, in run
    _run_main(main, args)
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/absl/app.py", line 258, in _run_main
    sys.exit(main(argv))
  File "/home/fraboeni/.cache/bazel/_bazel_fraboeni/eb0df9f25fbadff22165e0e943d33a0f/execroot/org_federated_research/bazel-out/k8-opt/bin/distributed_dp/fl_run.runfiles/org_federated_research/distributed_dp/fl_run.py", line 185, in main
    client_optimizer_fn = optimizer_utils.create_optimizer_fn_from_flags('client')
  File "/home/fraboeni/.cache/bazel/_bazel_fraboeni/eb0df9f25fbadff22165e0e943d33a0f/execroot/org_federated_research/bazel-out/k8-opt/bin/distributed_dp/fl_run.runfiles/org_federated_research/utils/optimizers/optimizer_utils.py", line 269, in create_optimizer_fn_from_flags
    raise ValueError('`{!s}` is not a valid optimizer for flag --{!s}, must be '
ValueError: `None` is not a valid optimizer for flag --client_optimizer, must be one of ['sgd', 'adagrad', 'adam', 'yogi', 'lars', 'lamb', 'shampoo']. See error log for details.

The issue did not occur when specifying the flags as in your example:

bazel run :fl_run -- \
    --task=emnist_character \
    --server_optimizer=sgd \
    --server_learning_rate=1 \
    --server_sgd_momentum=0.9 \
    --client_optimizer=sgd \
    --client_learning_rate=0.03 \
    --client_batch_size=20 \
    --experiment_name=my_emnist_test \
    --epsilon=10 \
    --l2_norm_clip=0.03 \
    --dp_mechanism=ddgauss \
    --logtostderr

This started very promising, then I got a different error:

I0310 17:29:00.706568 139991547269888 fl_utils.py:71] Shared DP Parameters:
I0310 17:29:00.706730 139991547269888 fl_utils.py:72] {'clip': 0.03,
 'delta': 0.0002941176470588235,
 'dim': 1018174,
 'epsilon': 10.0,
 'mechanism': 'ddgauss',
 'num_clients': 3400,
 'num_clients_per_round': 100,
 'num_rounds': 1500,
 'sampling_rate': 0.029411764705882353}
I0310 17:30:57.426323 139991547269888 fl_utils.py:151] ddgauss parameters:
I0310 17:30:57.426513 139991547269888 fl_utils.py:152] {'beta': 0.6065306597126334,
 'bits': 16,
 'dim': 1018174,
 'gamma': 3.292593044721554e-06,
 'inflated_l2': 0.030049064475707276,
 'k_stddevs': 4,
 'local_stddev': 0.002681329925591648,
 'mechanism': 'ddgauss',
 'noise_mult_clip': 0.8937766418638827,
 'noise_mult_inflated': 0.8923172725591274,
 'padded_dim': 1048576.0,
 'scale': 303711.99429067835}
I0310 17:30:57.426573 139991547269888 ddpquery_utils.py:44] Conditional rounding set to True (beta = 0.606531)
I0310 17:30:57.510118 139991547269888 keras_utils.py:365] Adding default num_examples metric to model
I0310 17:30:57.510220 139991547269888 keras_utils.py:368] Adding default num_batches metric to model
I0310 17:30:58.755060 139991547269888 keras_utils.py:365] Adding default num_examples metric to model
I0310 17:30:58.755179 139991547269888 keras_utils.py:368] Adding default num_batches metric to model
I0310 17:31:00.380089 139991547269888 keras_utils.py:365] Adding default num_examples metric to model
I0310 17:31:00.380198 139991547269888 keras_utils.py:368] Adding default num_batches metric to model
I0310 17:31:02.371215 139991547269888 keras_utils.py:365] Adding default num_examples metric to model
I0310 17:31:02.371326 139991547269888 keras_utils.py:368] Adding default num_batches metric to model
I0310 17:31:02.647132 139991547269888 keras_utils.py:365] Adding default num_examples metric to model
I0310 17:31:02.647240 139991547269888 keras_utils.py:368] Adding default num_batches metric to model
I0310 17:31:02.859875 139991547269888 training_utils.py:68] Writing...
I0310 17:31:02.859981 139991547269888 training_utils.py:69]     program state to: /tmp/ddp_fl/checkpoints/my_emnist_test
I0310 17:31:02.860028 139991547269888 training_utils.py:70]     CSV metrics to: /tmp/ddp_fl/results/my_emnist_test/experiment.metrics.csv
I0310 17:31:02.860080 139991547269888 training_utils.py:71]     TensorBoard summaries to: /tmp/ddp_fl/logdir/my_emnist_test
I0310 17:31:02.860128 139991547269888 training_loop.py:189] Running training process
I0310 17:31:03.333363 139991547269888 training_loop.py:201] Initializing training process
I0310 17:31:03.397290 139991547269888 training_loop.py:115] Running evaluation at round 0
Traceback (most recent call last):
  File "/home/fraboeni/.cache/bazel/_bazel_fraboeni/eb0df9f25fbadff22165e0e943d33a0f/execroot/org_federated_research/bazel-out/k8-opt/bin/distributed_dp/fl_run.runfiles/org_federated_research/distributed_dp/fl_run.py", line 290, in <module>
    app.run(main)
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/absl/app.py", line 312, in run
    _run_main(main, args)
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/absl/app.py", line 258, in _run_main
    sys.exit(main(argv))
  File "/home/fraboeni/.cache/bazel/_bazel_fraboeni/eb0df9f25fbadff22165e0e943d33a0f/execroot/org_federated_research/bazel-out/k8-opt/bin/distributed_dp/fl_run.runfiles/org_federated_research/distributed_dp/fl_run.py", line 274, in main
    state = tff.simulation.run_training_process(
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/simulation/training_loop.py", line 206, in run_training_process
    evaluation_metrics = _run_evaluation(evaluation_fn,
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/simulation/training_loop.py", line 119, in _run_evaluation
    evaluation_metrics = evaluation_fn(state, evaluation_data)
  File "/home/fraboeni/.cache/bazel/_bazel_fraboeni/eb0df9f25fbadff22165e0e943d33a0f/execroot/org_federated_research/bazel-out/k8-opt/bin/distributed_dp/fl_run.runfiles/org_federated_research/distributed_dp/fl_run.py", line 270, in evaluation_fn
    return federated_eval(state.model, evaluation_data)
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/computation/computation_impl.py", line 119, in __call__
    return context.invoke(self, arg)
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/execution_contexts/sync_execution_context.py", line 65, in invoke
    return self._event_loop.run_until_complete(
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/asyncio/base_events.py", line 642, in run_until_complete
    return future.result()
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/common_libs/retrying.py", line 91, in retry_coro_fn
    raise e
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/common_libs/retrying.py", line 88, in retry_coro_fn
    return await fn(*args, **kwargs)
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/execution_contexts/async_execution_context.py", line 300, in invoke
    return await tracing.wrap_coroutine_in_current_trace_context(
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/common_libs/tracing.py", line 391, in _wrapped
    return await coro
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/execution_contexts/async_execution_context.py", line 141, in _invoke
    result = await executor.create_call(comp, arg)
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/common_libs/tracing.py", line 201, in async_trace
    result = await fn(*fn_args, **fn_kwargs)
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/reference_resolving_executor.py", line 346, in create_call
    return await comp_repr.invoke(self, arg)
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/reference_resolving_executor.py", line 166, in invoke
    return await executor._evaluate(comp_lambda.result, new_scope)  # pylint: disable=protected-access
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/reference_resolving_executor.py", line 516, in _evaluate
    return await self._evaluate_block(comp, scope)
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/common_libs/tracing.py", line 201, in async_trace
    result = await fn(*fn_args, **fn_kwargs)
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/reference_resolving_executor.py", line 480, in _evaluate_block
    return await self._evaluate(comp.block.result, scope)
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/reference_resolving_executor.py", line 508, in _evaluate
    return await self._evaluate_reference(comp, scope)
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/common_libs/tracing.py", line 201, in async_trace
    result = await fn(*fn_args, **fn_kwargs)
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/reference_resolving_executor.py", line 432, in _evaluate_reference
    return await scope.resolve_reference(comp.reference.name)
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/reference_resolving_executor.py", line 115, in resolve_reference
    return await value
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/reference_resolving_executor.py", line 510, in _evaluate
    return await self._evaluate_call(comp, scope)
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/common_libs/tracing.py", line 201, in async_trace
    result = await fn(*fn_args, **fn_kwargs)
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/reference_resolving_executor.py", line 448, in _evaluate_call
    func, arg = await asyncio.gather(func, get_arg())
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/reference_resolving_executor.py", line 444, in get_arg
    return await self._evaluate(comp.call.argument, scope=scope)
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/reference_resolving_executor.py", line 514, in _evaluate
    return await self._evaluate_struct(comp, scope)
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/common_libs/tracing.py", line 201, in async_trace
    result = await fn(*fn_args, **fn_kwargs)
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/reference_resolving_executor.py", line 468, in _evaluate_struct
    values = await asyncio.gather(*values)
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/reference_resolving_executor.py", line 508, in _evaluate
    return await self._evaluate_reference(comp, scope)
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/common_libs/tracing.py", line 201, in async_trace
    result = await fn(*fn_args, **fn_kwargs)
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/reference_resolving_executor.py", line 432, in _evaluate_reference
    return await scope.resolve_reference(comp.reference.name)
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/reference_resolving_executor.py", line 115, in resolve_reference
    return await value
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/reference_resolving_executor.py", line 510, in _evaluate
    return await self._evaluate_call(comp, scope)
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/common_libs/tracing.py", line 201, in async_trace
    result = await fn(*fn_args, **fn_kwargs)
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/reference_resolving_executor.py", line 448, in _evaluate_call
    func, arg = await asyncio.gather(func, get_arg())
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/reference_resolving_executor.py", line 444, in get_arg
    return await self._evaluate(comp.call.argument, scope=scope)
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/reference_resolving_executor.py", line 514, in _evaluate
    return await self._evaluate_struct(comp, scope)
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/common_libs/tracing.py", line 201, in async_trace
    result = await fn(*fn_args, **fn_kwargs)
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/reference_resolving_executor.py", line 468, in _evaluate_struct
    values = await asyncio.gather(*values)
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/reference_resolving_executor.py", line 508, in _evaluate
    return await self._evaluate_reference(comp, scope)
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/common_libs/tracing.py", line 201, in async_trace
    result = await fn(*fn_args, **fn_kwargs)
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/reference_resolving_executor.py", line 432, in _evaluate_reference
    return await scope.resolve_reference(comp.reference.name)
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/reference_resolving_executor.py", line 115, in resolve_reference
    return await value
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/reference_resolving_executor.py", line 510, in _evaluate
    return await self._evaluate_call(comp, scope)
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/common_libs/tracing.py", line 201, in async_trace
    result = await fn(*fn_args, **fn_kwargs)
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/reference_resolving_executor.py", line 448, in _evaluate_call
    func, arg = await asyncio.gather(func, get_arg())
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/reference_resolving_executor.py", line 444, in get_arg
    return await self._evaluate(comp.call.argument, scope=scope)
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/reference_resolving_executor.py", line 508, in _evaluate
    return await self._evaluate_reference(comp, scope)
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/common_libs/tracing.py", line 201, in async_trace
    result = await fn(*fn_args, **fn_kwargs)
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/reference_resolving_executor.py", line 432, in _evaluate_reference
    return await scope.resolve_reference(comp.reference.name)
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/reference_resolving_executor.py", line 115, in resolve_reference
    return await value
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/reference_resolving_executor.py", line 510, in _evaluate
    return await self._evaluate_call(comp, scope)
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/common_libs/tracing.py", line 201, in async_trace
    result = await fn(*fn_args, **fn_kwargs)
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/reference_resolving_executor.py", line 448, in _evaluate_call
    func, arg = await asyncio.gather(func, get_arg())
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/reference_resolving_executor.py", line 444, in get_arg
    return await self._evaluate(comp.call.argument, scope=scope)
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/reference_resolving_executor.py", line 514, in _evaluate
    return await self._evaluate_struct(comp, scope)
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/common_libs/tracing.py", line 201, in async_trace
    result = await fn(*fn_args, **fn_kwargs)
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/reference_resolving_executor.py", line 468, in _evaluate_struct
    values = await asyncio.gather(*values)
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/reference_resolving_executor.py", line 508, in _evaluate
    return await self._evaluate_reference(comp, scope)
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/common_libs/tracing.py", line 201, in async_trace
    result = await fn(*fn_args, **fn_kwargs)
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/reference_resolving_executor.py", line 432, in _evaluate_reference
    return await scope.resolve_reference(comp.reference.name)
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/reference_resolving_executor.py", line 115, in resolve_reference
    return await value
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/reference_resolving_executor.py", line 510, in _evaluate
    return await self._evaluate_call(comp, scope)
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/common_libs/tracing.py", line 201, in async_trace
    result = await fn(*fn_args, **fn_kwargs)
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/reference_resolving_executor.py", line 449, in _evaluate_call
    return await self.create_call(func, arg=arg)
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/common_libs/tracing.py", line 201, in async_trace
    result = await fn(*fn_args, **fn_kwargs)
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/reference_resolving_executor.py", line 342, in create_call
    return ReferenceResolvingExecutorValue(await
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/common_libs/tracing.py", line 201, in async_trace
    result = await fn(*fn_args, **fn_kwargs)
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/thread_delegating_executor.py", line 125, in create_call
    return await self._delegate(self._target_executor.create_call(comp, arg))
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/thread_delegating_executor.py", line 110, in _delegate
    result_value = await _delegate_with_trace_ctx(coro, self._event_loop)
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/common_libs/tracing.py", line 391, in _wrapped
    return await coro
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/common_libs/tracing.py", line 201, in async_trace
    result = await fn(*fn_args, **fn_kwargs)
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/federating_executor.py", line 457, in create_call
    return await self._strategy.compute_federated_intrinsic(
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/federating_executor.py", line 143, in compute_federated_intrinsic
    return await fn(arg)  # pylint: disable=not-callable
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/common_libs/tracing.py", line 201, in async_trace
    result = await fn(*fn_args, **fn_kwargs)
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/federated_resolving_strategy.py", line 458, in compute_federated_map
    return await self._map(arg, all_equal=False)
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/common_libs/tracing.py", line 201, in async_trace
    result = await fn(*fn_args, **fn_kwargs)
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/federated_resolving_strategy.py", line 339, in _map
    results = await asyncio.gather(*[
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/federated_resolving_strategy.py", line 336, in _map_child
    fn_at_child = await child.create_value(fn, fn_type)
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/common_libs/tracing.py", line 201, in async_trace
    result = await fn(*fn_args, **fn_kwargs)
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/thread_delegating_executor.py", line 115, in create_value
    return await self._delegate(
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/thread_delegating_executor.py", line 110, in _delegate
    result_value = await _delegate_with_trace_ctx(coro, self._event_loop)
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/common_libs/tracing.py", line 391, in _wrapped
    return await coro
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/common_libs/tracing.py", line 201, in async_trace
    result = await fn(*fn_args, **fn_kwargs)
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/eager_tf_executor.py", line 683, in create_value
    normalized_value = to_representation_for_type(value,
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/common_libs/tracing.py", line 228, in sync_trace
    result = fn(*fn_args, **fn_kwargs)
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/eager_tf_executor.py", line 519, in to_representation_for_type
    return _to_computation_internal_rep(
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/common_libs/tracing.py", line 228, in sync_trace
    result = fn(*fn_args, **fn_kwargs)
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/eager_tf_executor.py", line 405, in _to_computation_internal_rep
    embedded_fn = embed_tensorflow_computation(value, type_spec, device)
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/common_libs/tracing.py", line 228, in sync_trace
    result = fn(*fn_args, **fn_kwargs)
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/eager_tf_executor.py", line 273, in embed_tensorflow_computation
    comp = _ensure_comp_runtime_compatible(comp)
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/eager_tf_executor.py", line 246, in _ensure_comp_runtime_compatible
    _check_dataset_reduce_for_multi_gpu(graph_def)
  File "/home/fraboeni/.conda/envs/tff/lib/python3.9/site-packages/tensorflow_federated/python/core/impl/executors/eager_tf_executor.py", line 63, in _check_dataset_reduce_for_multi_gpu
    raise ValueError(
ValueError: Detected dataset reduce op in multi-GPU TFF simulation: `use_experimental_simulation_loop=True` for `tff.learning`; or use `for ... in iter(dataset)` for your own dataset iterations. See https://www.tensorflow.org/federated/tutorials/simulations_with_accelerators for examples.

Tried fixing that with disabling GPU execution by inserting the following lines here:

(following the tutorial: https://www.tensorflow.org/federated/tutorials/simulations_with_accelerators)

cpu_device = tf.config.list_logical_devices('CPU')[0]
tff.backends.native.set_local_python_execution_context(
    server_tf_device=cpu_device, client_tf_devices=[cpu_device])

and simply re-ran the command.

However, the error stayed the same. Would I have to do some re-build, or can you recommend me another way to get rid of the error resulting from tff?

Thank you very much!

@zcharles8
Copy link
Contributor

@fraboeni Can you see what happens if you try toggling this line:

use_experimental_simulation_loop=True)

For context, the client training that is part of tff.learning.build_federated_averaging_process can go in one of two ways depending on whether you set use_experimental_simulation_loop to True or False. Generally, setting this to True is for multi-GPU simulations.

@zcharles8
Copy link
Contributor

Also for context @kenziyuliu I believe the nightly TFF packages are currently broken. I believe that using the latest version is the recommended way to proceed (as in your comment above).

@fraboeni
Copy link
Author

Thanks for your prompt answer @zcharles8!

Unfortunately, no matter if I set the indicated line to True or False, I still get the same error.

@zcharles8
Copy link
Contributor

@fraboeni Is that true if you don't add the call to tff.backends.native.set_local_python_execution_context that you described above?

For context, I just ran the command you posted above (purely on CPU) and it worked fine using the default executor.

@zcharles8
Copy link
Contributor

Oh wait, I see the potential problem. @fraboeni It sounds like you are using a multi-GPU environment based on the error. If that is the case then you would need to alter this line: https://github.com/google-research/federated/blob/master/distributed_dp/fl_run.py#L266

In particular, set use_experimental_simulation_loop=True, matching the argument in tff.learning.build_federated_averaging_process. Let me know if that helps at all, and thanks for digging into this.

@fraboeni
Copy link
Author

Thank you very much @zcharles8.

Unfortunately, passing the parameter in the line you indicated also does not solve the issue:
federated_eval = tff.learning.build_federated_evaluation(task.model_fn,use_experimental_simulation_loop=True)

I also tried switching off GPUs by

cpu_device = tf.config.list_logical_devices('CPU')[0]
tff.backends.native.set_local_python_execution_context(
    server_tf_device=cpu_device, client_tf_devices=[cpu_device])

Or only using one GPU by that command. Unfortunately, nothing seems to change the error.

@fraboeni
Copy link
Author

Hi @zcharles8, are there any news from your side on how we could make the code here run?

@kenziyuliu
Copy link
Contributor

kenziyuliu commented Mar 25, 2022

Hi @fraboeni, I tried following #57 (comment) on a single-GPU machine, and by default things seem to work fine.

Specifically, I followed #57 (comment), fixed the error in #58, and checked that TF sees the GPU as

>>> tf.config.list_physical_devices()
[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'), PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

Running the example script from here seems to work (bazel run :fl_run -- ...). If it's a multi-GPU issue, maybe try forcing a single GPU as a workaround via export CUDA_VISIBLE_DEVICES=0. Hope this helps!

@DeepaliKushwaha
Copy link

Can anyone help me solve the same issue while using tff.templates.IterativeProcess instead of tff.learning.build_federated_averaging_process?

@kairouzp
Copy link
Contributor

Could you please expand more on where exactly you are doing? Are you creating a custom iterative process or using one that we are providing in the repo? Could you also please provide a snippet for the error you are seeing?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants