Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

5.0rc2: java.lang.RuntimeException: Unrecoverable error while evaluating node 'UnshareableActionLookupData #14286

Closed
brentleyjones opened this issue Nov 16, 2021 · 9 comments
Assignees
Labels
team-OSS Issues for the Bazel OSS team: installation, release processBazel packaging, website type: bug untriaged

Comments

@brentleyjones
Copy link
Contributor

brentleyjones commented Nov 16, 2021

Description of the problem / feature request:

New crash in 5.0.0rc2, which wasn't there in 5.0.0rc1. Probably related to 3947c83 (because of the MerkelTree in the stack trace).

FATAL: bazel crashed due to an internal error. Printing stack trace:
--
  | java.lang.RuntimeException: Unrecoverable error while evaluating node 'UnshareableActionLookupData{actionLookupKey=ConfiguredTargetKey{label=//Modules/BillingFrequencyUI:BillingFrequencyUISnapshotTests, config=BuildConfigurationValue.Key[a06c80cd72112c52f5a0baac012c8dd850e1084bd4132ee510a82e00988e02ae]}, actionIndex=4}' (requested by nodes 'TestCompletionKey{configuredTargetKey=ConfiguredTargetKey{label=//Modules/BillingFrequencyUI:BillingFrequencyUISnapshotTests, config=BuildConfigurationValue.Key[a06c80cd72112c52f5a0baac012c8dd850e1084bd4132ee510a82e00988e02ae]}, topLevelArtifactContext=com.google.devtools.build.lib.analysis.TopLevelArtifactContext@90904c3b, exclusiveTesting=false}')
  | at com.google.devtools.build.skyframe.AbstractParallelEvaluator$Evaluate.run(AbstractParallelEvaluator.java:674)
  | at com.google.devtools.build.lib.concurrent.AbstractQueueVisitor$WrappedRunnable.run(AbstractQueueVisitor.java:382)
  | at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
  | at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
  | at java.base/java.lang.Thread.run(Unknown Source)
  | Caused by: java.lang.IllegalStateException: rootMerkleTree.getInputFiles() 2852 != tree.numFiles() 2960
  | at com.google.common.base.Preconditions.checkState(Preconditions.java:736)
  | at com.google.devtools.build.lib.remote.merkletree.MerkleTree.build(MerkleTree.java:262)
  | at com.google.devtools.build.lib.remote.merkletree.MerkleTree.build(MerkleTree.java:218)
  | at com.google.devtools.build.lib.remote.RemoteExecutionService.buildInputMerkleTree(RemoteExecutionService.java:394)
  | at com.google.devtools.build.lib.remote.RemoteExecutionService.buildRemoteAction(RemoteExecutionService.java:426)
  | at com.google.devtools.build.lib.remote.RemoteSpawnCache.lookup(RemoteSpawnCache.java:102)
  | at com.google.devtools.build.lib.exec.AbstractSpawnStrategy.exec(AbstractSpawnStrategy.java:141)
  | at com.google.devtools.build.lib.exec.AbstractSpawnStrategy.exec(AbstractSpawnStrategy.java:108)
  | at com.google.devtools.build.lib.actions.SpawnStrategy.beginExecution(SpawnStrategy.java:47)
  | at com.google.devtools.build.lib.exec.SpawnStrategyResolver.beginExecution(SpawnStrategyResolver.java:68)
  | at com.google.devtools.build.lib.exec.StandaloneTestStrategy.beginTestAttempt(StandaloneTestStrategy.java:439)
  | at com.google.devtools.build.lib.exec.StandaloneTestStrategy.access$200(StandaloneTestStrategy.java:84)
  | at com.google.devtools.build.lib.exec.StandaloneTestStrategy$StandaloneTestRunnerSpawn.beginExecution(StandaloneTestStrategy.java:674)
  | at com.google.devtools.build.lib.analysis.test.TestRunnerAction.beginIfNotCancelled(TestRunnerAction.java:905)
  | at com.google.devtools.build.lib.analysis.test.TestRunnerAction.beginExecution(TestRunnerAction.java:872)
  | at com.google.devtools.build.lib.analysis.test.TestRunnerAction.execute(TestRunnerAction.java:930)
  | at com.google.devtools.build.lib.analysis.test.TestRunnerAction.execute(TestRunnerAction.java:921)
  | at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor$5.execute(SkyframeActionExecutor.java:909)
  | at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor$ActionRunner.continueAction(SkyframeActionExecutor.java:1078)
  | at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor$ActionRunner.run(SkyframeActionExecutor.java:1033)
  | at com.google.devtools.build.lib.skyframe.ActionExecutionState.runStateMachine(ActionExecutionState.java:152)
  | at com.google.devtools.build.lib.skyframe.ActionExecutionState.getResultOrDependOnFuture(ActionExecutionState.java:91)
  | at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor.executeAction(SkyframeActionExecutor.java:496)
  | at com.google.devtools.build.lib.skyframe.ActionExecutionFunction.checkCacheAndExecuteIfNeeded(ActionExecutionFunction.java:856)
  | at com.google.devtools.build.lib.skyframe.ActionExecutionFunction.computeInternal(ActionExecutionFunction.java:349)
  | at com.google.devtools.build.lib.skyframe.ActionExecutionFunction.compute(ActionExecutionFunction.java:169)
  | at com.google.devtools.build.skyframe.AbstractParallelEvaluator$Evaluate.run(AbstractParallelEvaluator.java:590)
  | ... 4 more

Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

Replace this line with your answer.

What operating system are you running Bazel on?

macOS 11.6.1

What's the output of bazel info release?

release 5.0.0rc2

@meisterT
Copy link
Member

cc @coeuvre @moroten

@Wyverald Wyverald added this to the Bazel 5.0 Release Blockers milestone Nov 16, 2021
@moroten
Copy link
Contributor

moroten commented Nov 17, 2021

@brentleyjones Does this happen without enabling the experimental feature?
What command line did you use and was it on a public repository?

@brentleyjones
Copy link
Contributor Author

Yes, we aren't using the experimental feature. It's a private repo. I'll grab the command line soon.

@brentleyjones
Copy link
Contributor Author

Effective command line:

bazel test \
  --isatty=0 \
  --terminal_columns=80 \
  --experimental_allow_tags_propagation=1 \
  --cxxopt=-Wno-missing-field-initializers \
  --action_env=ZERO_AR_DATE=1 \
  --experimental_worker_allow_json_protocol=1 \
  --host_swiftcopt=-disallow-use-new-driver \
  --swiftcopt=-disallow-use-new-driver \
  --color=yes \
  --define=apple.compress_ipa=yes \
  --define=apple.package_swift_support=no \
  --enable_platform_specific_config=1 \
  --host_linkopt=-Wl,-no_objc_category_merging \
  --linkopt=-Wl,-no_objc_category_merging \
  --host_linkopt=-Wl,-arch_errors_fatal \
  --host_linkopt=-Wl,-fatal_warnings \
  --host_linkopt=-Wl,-unaligned_pointers,error \
  --linkopt=-Wl,-arch_errors_fatal \
  --linkopt=-Wl,-fatal_warnings \
  --linkopt=-Wl,-no_function_starts \
  --linkopt=-Wl,-unaligned_pointers,error \
  --experimental_build_event_upload_strategy=local \
  --experimental_guard_against_concurrent_changes=1 \
  --experimental_remote_cache_async=1 \
  --experimental_repository_downloader_retries=2 \
  --experimental_use_llvm_covmap=1 \
  --features=debug_prefix_map_pwd_is_dot \
  --features=oso_prefix_is_pwd \
  --features=relative_ast_path \
  --features=remap_xcode_path \
  --features=swift.cacheable_swiftmodules \
  --features=swift.coverage_prefix_map \
  --features=swift.no_embed_debug_module \
  --features=swift.opt_uses_osize \
  --features=swift.opt_uses_wmo \
  --features=swift.remap_xcode_path \
  --features=swift.use_global_module_cache \
  --features=swift.use_response_files \
  --features=swift.vfsoverlay \
  --grpc_keepalive_time=20s \
  --host_copt=-Werror \
  --host_swiftcopt=-warnings-as-errors \
  --incompatible_avoid_hardcoded_objc_compilation_flags=1 \
  --incompatible_default_to_explicit_init_py=1 \
  --incompatible_remote_results_ignore_disk=1 \
  --instrumentation_filter=//Modules[/:] \
  --ios_minimum_os=12.0 \
  --ios_simulator_device=iPhone X \
  --ios_simulator_version=15.0 \
  --local_test_jobs=1 \
  --macos_minimum_os=11.0 \
  --objccopt=-DNDEBUG=1 \
  --objccopt=-Oz \
  --objccopt=-Winit-self \
  --objccopt=-Wno-extra \
  --objccopt=-Wno-unused-variable \
  --remote_local_fallback=1 \
  --strip=never \
  --swiftcopt=-swift-version \
  --swiftcopt=5 \
  --host_swiftcopt=-swift-version \
  --host_swiftcopt=5 \
  --swiftcopt=-whole-module-optimization \
  --host_swiftcopt=-whole-module-optimization \
  --test_env=BBL_REPO_PATH \
  --test_env=LCOV_MERGER=/usr/bin/true \
  --test_env=LOCAL_BBL \
  --test_env=PYTHONNOUSERSITE=affirmative \
  --test_keep_going=false \
  --test_timeout=-1,-1,-1,7200 \
  --trim_test_configuration=1 \
  --use_top_level_targets_for_symlinks=1 \
  --worker_max_instances=HOST_CPUS \
  --xcode_version_config=//bazel/third_party:xcode_config_local \
  --remote_default_exec_properties=clean-workspace-inputs=* \
  --remote_default_exec_properties=OSFamily=darwin \
  --remote_default_exec_properties=recycle-runner=true \
  --xcode_version=13A233 \
  --modify_execution_info=^(BitcodeSymbolsCopy|BundleApp|BundleTreeApp|DsymDwarf|DsymLipo|GenerateAppleSymbolsFile|ObjcBinarySymbolStrip|ObjcLink|ProcessAndSign|SwiftArchive)$=+no-remote-cache \
  --config=bes \
  --build_metadata=ALLOW_ENV=PATH \
  --workspace_status_command=$(pwd)/bazel/workspace-status.sh \
  --config=buildbuddycache \
  --remote_cache=remote.buildbuddy.io \
  --remote_instance_name=ios/1 \
  --config=trace \
  --build_event_json_file=./tmp/logs/bep.json \
  --execution_log_json_file=./tmp/logs/execution.json \
  --generate_json_trace_profile=1 \
  --experimental_remote_capture_corrupted_outputs=./tmp/logs/corrupted_outputs \
  --experimental_remote_grpc_log=./tmp/logs/grpc.log \
  --explain=./tmp/logs/explanation.log \
  --profile=./tmp/logs/trace.json \
  --verbose_explanations=1 \
  --config=ci \
  --disk_cache= \
  --remote_upload_local_results=1 \
  --show_timestamps=1 \
  --verbose_failures=1 \
  --bes_upload_mode=wait_for_upload_complete \
  --remote_header=<REDACTED> \
  --embed_label=1637088127 \
  --define=apple.experimental.tree_artifact_outputs=1 \
  --test_output=errors \
  --rc_source=client \
  --rc_source=/Users/iosci/buildkite/src/iosci94/lyft/ios/.bazelrc \
  --rc_source=/Users/iosci/buildkite/src/iosci94/lyft/ios/tmp/extra.bazelrc \
  --rc_source=/Users/iosci/buildkite/src/iosci94/lyft/ios/bazel/noremote.bazelrc \
  --rc_source=/Users/iosci/buildkite/src/iosci94/lyft/ios/tmp/ci.bazelrc \
  --rc_source=/Users/iosci/buildkite/src/iosci94/lyft/ios/bazel/ci.bazelrc \
  --startup_time=68 \
  --command_wait_time=0 \
  --extract_data_time=0 \
  --binary_path=/Users/iosci/Library/Caches/bazelisk/downloads/bazelbuild/bazel-5.0.0rc2-darwin-x86_64/bin/bazel \
  --client_env=BAZEL_IGNORE_SYSTEM_HEADERS_VERSIONS=<REDACTED> \
  --client_env=BAZEL_USE_XCODE_TOOLCHAIN=<REDACTED> \
  --client_env=CC=<REDACTED> \
  --client_env=PATH=/usr/bin:/bin \
  --client_env=PYTHONNOUSERSITE=<REDACTED> \
  --client_env=USER=iosci \
  --client_env=HOME=<REDACTED> \
  --client_env=TERM=<REDACTED> \
  --client_env=JOB_IS_RUNNING_ON_CI=<REDACTED> \
  --client_env=PR_NUMBER=<REDACTED> \
  --client_env=BUILDKITE_BRANCH=bj/bazel-upgrade-to-5.0.0rc2 \
  --client_env=BUILDKITE_BUILD_URL=https://buildkite.com/lyft/ios/builds/784580 \
  --client_env=BUILDKITE_COMMIT=e074dde45f1602b3ec5d3dc8d9e22728b144e89b \
  --client_env=BUILDKITE_JOB_ID=2022d4d4-de85-4371-91b9-996670dadda0 \
  --client_env=USE_CLANG_CL=<REDACTED> \
  --client_env=__CF_USER_TEXT_ENCODING=<REDACTED> \
  --client_cwd=/Users/iosci/buildkite/src/iosci94/lyft/ios \
  --config=QA \
  --config=qa \
  --compilation_mode=fastbuild \
  --config=nosandbox \
  --remote_default_exec_properties=preserve-workspace=true \
  --spawn_strategy=remote,worker,local \
  --worker_sandboxing=false \
  --config=warnings_as_errors \
  --config=clang_warnings \
  --copt=-Wall \
  --copt=-Wextra \
  --copt=-Wpedantic \
  --copt=-Werror \
  --copt=-Wno-gnu-conditional-omitted-operand \
  --copt=-Wno-gnu-statement-expression \
  --copt=-Wno-unused-parameter \
  --swiftcopt=-warnings-as-errors \
  --define=apple.add_debugger_entitlement=false \
  --define=exclude_symbols=yes \
  --define=flavor=qa \
  --swiftcopt=-gnone \
  --test_env=RECORD_MODE=0 \
  --test_env=TREAT_RECORDINGS_AS_ARTIFACTS=1 \
  --target_pattern_file=/var/folders/y3/981t90z14c73952t8nvxylcw0000gn/T/tmp.XfO8IeR0

@brentleyjones
Copy link
Contributor Author

brentleyjones commented Nov 17, 2021

Of note, this only happens in our builds that depend on a custom repository rule:

snapshot_repo.bzl:

"""
A rule to create and download snapshot artifacts
"""

load(
    "@bazel_tools//tools/build_defs/repo:utils.bzl",
    "read_netrc",
    "use_netrc",
)

def _impl(ctx):
    module = ctx.attr.module
    name = ctx.attr.name

    ctx.file("BUILD.bazel", """filegroup(
    name = "{name}",
    srcs = glob(["**"], exclude_directories = 0) + ["@//Modules/{module}:snapshots.json"],
    visibility = ["@//Modules/{module}:__pkg__"],
)
""".format(name = name, module = module))

    manifest_path = ctx.path(Label("@//Modules/{}:snapshots.json".format(module)))
    if not manifest_path.exists:
        return

    manifest = ctx.read(manifest_path)
    manifest_json = json.decode(manifest)
    snapshots_sha256 = manifest_json.get("snapshots-sha256") or manifest_json.get(module)
    if not snapshots_sha256:
        return

    archive_sha256 = manifest_json.get("archive-sha256", "")
    repo = "local-mobile-snapshots"
    path = "ios/snapshots/Modules/{0}/{snapshots_sha}/snapshots{archive_sha}.zip".format(module, snapshots_sha = snapshots_sha256, archive_sha = archive_sha256)
    primary_url = "https://artifactory-n.lyft.net/artifactory/{}/{}".format(repo, path)
    fallback_url = "https://artifactory-1.lyft.net/artifactory/{}/{}".format(repo, path)
    urls = [primary_url, fallback_url]
    netrc = read_netrc(ctx, ctx.attr._netrc)
    auth = use_netrc(netrc, urls, {})

    download_info = ctx.download_and_extract(
        url = urls,
        auth = auth,
        sha256 = archive_sha256,
    )

snapshot_repo = repository_rule(
    implementation = _impl,
    attrs = dict(
        module = attr.string(mandatory = True),
        _netrc = attr.label(
            default = "//bazel/internal:artifactory.netrc",
            allow_single_file = True,
        ),
    ),
)

snapshot_repositories.bzl:

"""
This creates snapshot repositories for the snapshot modules
"""

load("@SnapshotModules//:snapshot_modules.bzl", "SNAPSHOT_MODULES")
load("//bazel/internal:snapshot_repo.bzl", "snapshot_repo")

def snapshot_repositories():
    """Load the snapshot data from snapshot repositories"""
    for module in SNAPSHOT_MODULES:
        name = module + "Snapshots"
        snapshot_repo(name = name, module = module)

@moroten
Copy link
Contributor

moroten commented Nov 18, 2021

I've reproduced it using a minimal example with a single file mydir/bar.txt and the content of BUILD.bazel like

genrule(
  name = 'foo',
  srcs = ["mydir", "mydir/bar.txt"],
  outs = ["foo.txt"],
  cmd = "echo \"foo bar\" > $@",
)

This crashes with rootMerkleTree.getInputFiles() 2 != tree.numFiles() 3. The MerkleTree builder cannot handle the directory specified and a file in that directory. Probably becomes a duplicated entry somewhere.

@meisterT meisterT added release blocker type: bug untriaged team-OSS Issues for the Bazel OSS team: installation, release processBazel packaging, website labels Nov 18, 2021
moroten added a commit to moroten/bazel that referenced this issue Nov 19, 2021
The DirectoryTreeBuilder did not check if files already existed in the
resulting map, so the file counter got wrong and an assertion failed.

Fixes bazelbuild#14286.
moroten added a commit to moroten/bazel that referenced this issue Nov 19, 2021
The DirectoryTreeBuilder did not check if files already existed in the
resulting map, so the file counter got wrong and an assertion failed.

Fixes bazelbuild#14286.
moroten added a commit to moroten/bazel that referenced this issue Nov 19, 2021
The DirectoryTreeBuilder did not check if files already existed in the
resulting map, so the file counter got wrong and an assertion failed.

The error was visible when adding a file and the directory containing
that file as inputs for an action.

Fixes bazelbuild#14286.
moroten added a commit to moroten/bazel that referenced this issue Nov 19, 2021
The DirectoryTreeBuilder did not check if files already existed in the
resulting map, so the file counter got wrong and an assertion failed.

The error was visible when adding a file and the directory containing
that file as inputs for an action.

Fixes bazelbuild#14286.
@moroten
Copy link
Contributor

moroten commented Nov 22, 2021

#14299 is ready for review to solve this blocker.

@brentleyjones
Copy link
Contributor Author

@meteorcloudy can we reopen this until it's in the rc branch?

@meteorcloudy meteorcloudy reopened this Nov 24, 2021
meteorcloudy pushed a commit to meteorcloudy/bazel that referenced this issue Nov 25, 2021
The DirectoryTreeBuilder did not check if files already existed in the resulting map, so the file counter got wrong and an assertion failed.

The error was visible when adding a file and the directory containing that file as inputs for an action.

Fixes bazelbuild#14286.

Closes bazelbuild#14299.

PiperOrigin-RevId: 412051374
coeuvre pushed a commit to coeuvre/bazel that referenced this issue Nov 25, 2021
The DirectoryTreeBuilder did not check if files already existed in the resulting map, so the file counter got wrong and an assertion failed.

The error was visible when adding a file and the directory containing that file as inputs for an action.

Fixes bazelbuild#14286.

Closes bazelbuild#14299.

PiperOrigin-RevId: 412051374
meteorcloudy pushed a commit that referenced this issue Nov 25, 2021
The DirectoryTreeBuilder did not check if files already existed in the resulting map, so the file counter got wrong and an assertion failed.

The error was visible when adding a file and the directory containing that file as inputs for an action.

Fixes #14286.

Closes #14299.

PiperOrigin-RevId: 412051374

Co-authored-by: Fredrik Medley <[email protected]>
@meteorcloudy
Copy link
Member

Merged into 5.0.0rc2

Bencodes pushed a commit to Bencodes/bazel that referenced this issue Jan 10, 2022
The DirectoryTreeBuilder did not check if files already existed in the resulting map, so the file counter got wrong and an assertion failed.

The error was visible when adding a file and the directory containing that file as inputs for an action.

Fixes bazelbuild#14286.

Closes bazelbuild#14299.

PiperOrigin-RevId: 412051374
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team-OSS Issues for the Bazel OSS team: installation, release processBazel packaging, website type: bug untriaged
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants