Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace #4505 with a different set of workarounds #4527

Open
wants to merge 7 commits into
base: trunk
Choose a base branch
from

Conversation

chandlerc
Copy link
Contributor

@chandlerc chandlerc commented Nov 14, 2024

This restores the symlinks for the installation, but teaches the busybox
info search to look for a relative path to the busybox binary itself
before walking through symlinks. This let's it find the tree structure
when directly invoking prefix_root/bin/carbon or similar, either
inside of a Bazel rule or from the command line, and mirrors how we
expect the installed tree to look. This works even when Bazel resolves
the symlink target fully, and potentially to something nonsensical like
a CAS file.

In order to make a convenient Bazel target that can be used with bazel run //toolchain, this adds an override to explicitly set the desired
argv[0] to use when selecting a mode for the busybox and a busybox
binary. Currently, the workaround uses an environment variable because
that required the least amount of plumbing, and seems a useful override
mechanism generally, but I'm open to other approaches.

This should allow a few things to work a bit more nicely:

  • It should handle sibling symlinks like clang++ to clang or
    ld.lld to lld, where that symlink in turn points at the busybox.
    We want to use initial argv[0] value to select the mode there.
  • It avoids bouncing through Python (or other subprocesses) when
    invoking the carbon binary in Bazel rules, which will be nice for
    building the example code and benchmarking.

It does come at a cost of removing one feature: the initial symlink
can't be some unrelated alias like my_carbon_symlink -- we expect the
first argv[0] name to have the meaningful filename for selecting
a busybox mode.

It also trades the complexity of the Python script for some complexity
in the busybox search in order to look for a relative carbon-busybox
binary. On the whole, I think that tradeoff is worthwhile, but it isn't
free.

@chandlerc chandlerc requested a review from jonmeow November 14, 2024 09:56
@github-actions github-actions bot added documentation An issue or proposed change to our documentation toolchain labels Nov 14, 2024
@jonmeow
Copy link
Contributor

jonmeow commented Nov 14, 2024

Maybe I'm missing something, but the use of copy_file to create a symlink looks equivalent to a rollback of #4505. I think bazelbuild/bazel#23620 is correct about workaround options, but let me give my understanding of what's happening.

This change tries to create a symlink chain such as:

./bazel-bin/toolchain/install/run/carbon -> /.../toolchain/install/prefix_root/bin/carbon
/.../toolchain/install/prefix_root/bin/carbon -> /.../toolchain/install/prefix_root/lib/carbon/carbon-busybox
/.../toolchain/install/prefix_root/lib/carbon/carbon-busybox (regular file)

While that's typical output, Bazel could use its CAS (content-addressable storage) cache. In this model, Bazel's promise is to provide something which matches the checksum of the regular file, not the file/symlink structure (this is bazelbuild/bazel#23620). This is also true of carbon-busybox, which is not required to be a regular file: it just needs equivalent content. For example, this is also valid:

./bazel-bin/toolchain/install/run/carbon -> /cas/somechecksum
/.../toolchain/install/prefix_root/bin/carbon -> /cas/somechecksum
/.../toolchain/install/prefix_root/lib/carbon/carbon-busybox -> /cas/somechecksum

Note that things like remote caching can be a lot firmer about removing indirections, even more than the local cache. That's why we need bazelbuild/bazel#23620 to be fixed if we want to use symlinks with Bazel, since it's about retaining symlink structure.

As far as alternative workarounds:

So that's why I went with something like the 2nd workaround: scripts rather than symlinks.

I believe Google's Bazel support makes much more eager decisions about when to reuse cache content, and how to structure symlinks. If you want to go this route, perhaps it would make sense to verify that it works robustly using your access to Google's infrastructure?

@jonmeow
Copy link
Contributor

jonmeow commented Nov 14, 2024

Oh, and to be sure, I believe that the main difference prior to #4505 is in changing the name "run_carbon" to "run/carbon". But since it's going back to symlinks, I do think that this is brittle. Maybe I could have been clearer in-person, but I think any solution using symlinks is going to require a split implementation: one for this repository on GitHub, one for Google.

@chandlerc
Copy link
Contributor Author

Oh, and to be sure, I believe that the main difference prior to #4505 is in changing the name "run_carbon" to "run/carbon".

Yes, it was trying to couple this change with the change to prevent a symlink for the busybox itself.

But thanks for the context on worrying about remote caches forcing even the original busybox to be a symlink into some CAS or other thing that loses all the interesting context. That was what I was missing that #4505 was trying to work around. Sorry if it wasn't clear -- this PR was mostly a question, I wasn't at all confident.

I have another idea that I'm going to play with, but not sure it will work. But now I know how to test that.

@chandlerc chandlerc force-pushed the busybox-take3 branch 2 times, most recently from 42849ee to d751db4 Compare November 21, 2024 03:04
@chandlerc chandlerc changed the title Replace #4505 with a hopefully simpler workaround Replace #4505 with a different set of workarounds Nov 21, 2024
This restores the symlinks for the installation, but teaches the busybox
info search to look for a relative path to the busybox binary itself
before walking through symlinks. This let's it find the tree structure
when directly invoking `prefix_root/bin/carbon` or similar, either
inside of a Bazel rule or from the command line, and mirrors how we
expect the installed tree to look. This works even when Bazel resolves
the symlink target fully, and potentially to something nonsensical like
a CAS file.

In order to make a convenient Bazel target that can be used with `bazel
run //toolchain`, this adds an override to explicitly set the desired
argv[0] to use when selecting a mode for the busybox and a busybox
binary. Currently, the workaround uses an environment variable because
that required the least amount of plumbing, and seems a useful override
mechanism generally, but I'm open to other approaches.

This should allow a few things to work a bit more nicely:
- It should handle sibling symlinks like `clang++` to `clang` or
  `ld.lld` to `lld`, where that symlink in turn points at the busybox.
  We want to use *initial* `argv[0]` value to select the mode there.
- It avoids bouncing through Python (or other subprocesses) when
  invoking the `carbon` binary in Bazel rules, which will be nice for
  building the example code and benchmarking.

It does come at a cost of removing one feature: the initial symlink
can't be some unrelated alias like `my_carbon_symlink` -- we expect the
*first* argv[0] name to have the meaningful filename for selecting
a busybox mode.

It also trades the complexity of the Python script for some complexity
in the busybox search in order to look for a relative `carbon-busybox`
binary. On the whole, I think that tradeoff is worthwhile, but it isn't
free.
@chandlerc
Copy link
Contributor Author

Ok, after this helpful feedback and some offline discussions, I think a new attempt.

The core of this is to look for a relative carbon-busybox binary prior to walking through symlinks, and an override to use from the Bazel run_tool rule.

Should get us symlink overhead but be resilient to all the perplexities of symlinks (I hope!). PTAL!

@chandlerc chandlerc requested a review from jonmeow November 21, 2024 03:26
@chandlerc
Copy link
Contributor Author

(Also, just flagging that I updated the PR description with new context.)

Copy link
Contributor

@jonmeow jonmeow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood you intend things to change, I still would appreciate tests demonstrating the behavior that's not intended to work anymore. It would make it clearer that it's an intended break, not an accidental one. (suggestions for specific things below)

toolchain/install/busybox_info.h Outdated Show resolved Hide resolved
toolchain/install/busybox_info_test.cpp Outdated Show resolved Hide resolved
toolchain/install/busybox_info_test.cpp Outdated Show resolved Hide resolved
toolchain/install/busybox_info_test.cpp Outdated Show resolved Hide resolved
toolchain/install/busybox_info_test.cpp Outdated Show resolved Hide resolved
@@ -166,5 +167,58 @@ TEST_F(BusyboxInfoTest, NotBusyboxSymlink) {
EXPECT_FALSE(info.ok());
}

TEST_F(BusyboxInfoTest, LayerSymlinksInstallTree) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yesterday, one of the concerns I mentioned was incorrect install selection. You'd said you're okay with that, but would it be worth adding a test that demonstrates the expected result?

(note, you could share install tree creation via a helper with LayerSymlinksInstallTree)

e.g., choosing "/opt" because that's a thing:

TEST_F(BusyBoxInfoTest, SymlinkAtSameLevelReturnsWrongInstall) {
  // Just enough of the system install that incorrect looks can find the busybox.
  MakeDir(dir_ / "lib");
  MakeDir(dir_ / "lib/carbon");
  MakeFile(dir_ / "lib/carbon/carbon-busybox");

  // What it would look like if someone also did an install in "opt".
  MakeDir(dir_ / "opt");
  MakeDir(dir_ / "opt/lib");
  MakeDir(dir_ / "opt/lib/carbon");
  MakeFile(dir_ / "opt/lib/carbon/carbon-busybox");
  MakeDir(dir_ / "opt/lib/carbon/llvm");
  MakeDir(dir_ / "opt/lib/carbon/llvm/bin");
  MakeSymlink(dir_ / "opt/lib/carbon/llvm/bin/clang", "../../carbon-busybox")
  MakeDir(dir_ / "opt/bin");
  MakeSymlink(dir_ / "opt/bin/carbon", "../lib/carbon/carbon-busybox")

  auto target = MakeSymlink(dir_ / "opt/carbon", "bin/carbon");

  auto info = GetBusyboxInfo(target.string());
  ASSERT_TRUE(info.ok()) << info.error();
  EXPECT_THAT(info->bin_path, Eq(dir_ / "opt/../lib/carbon/carbon-busybox"));
  EXPECT_THAT(info->mode, Eq("clang"));
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. I've framed it more as a "positive" test that the busybox info finds a relative busybox before walking symlinks further. I think this reflects what you're looking for, but tell me if not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per discussion, I was trying to note the case of "/bin/carbon" versus "/opt/carbon" getting mixed up when "/opt/carbon" is pointing at an install under "/opt" instead of "/". I think the test is doing something different.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, I've added dedicated testing for this scenario and checks to make sure we only do the relative path walk when in a bin directory so its at least less likely to bump into this.

Copy link
Contributor Author

@chandlerc chandlerc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for review, PTAL.

toolchain/install/busybox_info_test.cpp Outdated Show resolved Hide resolved
@@ -166,5 +167,58 @@ TEST_F(BusyboxInfoTest, NotBusyboxSymlink) {
EXPECT_FALSE(info.ok());
}

TEST_F(BusyboxInfoTest, LayerSymlinksInstallTree) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. I've framed it more as a "positive" test that the busybox info finds a relative busybox before walking symlinks further. I think this reflects what you're looking for, but tell me if not.

toolchain/install/busybox_info_test.cpp Outdated Show resolved Hide resolved
toolchain/install/busybox_info_test.cpp Outdated Show resolved Hide resolved
@chandlerc chandlerc requested a review from jonmeow November 22, 2024 22:43
Copy link
Contributor Author

@chandlerc chandlerc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there was only one real outstanding issue here, but I know Jon is out for the next week so if someone else is up to picking up the last iteration on the review, I'd appreciate that. I think I've addressed Jon's comment here.

@@ -166,5 +167,58 @@ TEST_F(BusyboxInfoTest, NotBusyboxSymlink) {
EXPECT_FALSE(info.ok());
}

TEST_F(BusyboxInfoTest, LayerSymlinksInstallTree) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, I've added dedicated testing for this scenario and checks to make sure we only do the relative path walk when in a bin directory so its at least less likely to bump into this.

@chandlerc chandlerc removed the request for review from jonmeow November 23, 2024 05:15
@geoffromer geoffromer self-requested a review November 26, 2024 18:21
Comment on lines 27 to 28
// Returns the busybox information, given argv[0]. This primarily handles
// resolving symlinks that point at the busybox.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having trouble evaluating the test changes, because this comment doesn't really give me enough information to understand this function's desired behavior under the test conditions. Also, "This primarily handles resolving symlinks" doesn't seem accurate anymore.

After a fair amount of investigation, I think the key information that's missing here is that this function tries to find and return a bin_path referring to a file named "carbon-busybox" that is part of the same Carbon install (as indicated by their paths) as argv0 or some file in the chain of symlinks from argv0 (which means we can presumably access the rest of the install by rewriting bin_path). If that's right, I think it would help to document that as part of the contract.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, updated comment a bit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what it's worth, I don't think the new comment is specific enough to have helped me understand the implementation or the tests (e.g. I don't think I'd know what "most consistent" means, and I particularly wouldn't expect it to mean that shallower symlink traversals are preferred). However, that level of specificity may not be what other readers need, especially when the function is doing something heuristic like this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried a new version of the comment. Is it any better?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be more explicit: I don't think this is blocking, because I'm not sure how much my experience as a reader generalizes, and I'm not sure how much effort it's worth putting into wordsmithing this API comment.

The previous revision worked better for me than this one, because it focused on the contract (albeit in imprecise terms), whereas this focuses on the implementation, and the implementation is complex enough that it's not clear what it implies about the contract.

If you want to keep iterating on this comment, maybe something like this?

"If argv0 is the path of a busyboxed executable that is part of a valid Carbon install, or indirectly refers to it via symlinks, then bin_path will be the path of the lib/carbon/carbon-busybox file in the same Carbon install."

I'm not sure that fully captures the intent, but that's the kind of contract-focused specificity that would have helped me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, tried wording much closer to that.

And it wasn't clear any of the comments were non-blocking, mostly trying to make progress here as this PR has been in review for a long time now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, tried wording much closer to that.

Looks good, thanks.

And it wasn't clear any of the comments were non-blocking, mostly trying to make progress here as this PR has been in review for a long time now.

Sorry, I'll try to be more explicit about that.

toolchain/install/busybox_info.h Show resolved Hide resolved
Comment on lines +245 to +246
// Starting from the second install uses the relative busybox rather than
// traversing the symlink further.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this important? If they're all symlinks to the same file, won't the user get the same behavior either way?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two installs might be different versions or unrelated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow how that could be the case, given that the test seems to say that one of them is a collection of symlinks to the other.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have two installs, let's say version v0.1 in /usr and v0.2 in /usr/local.

For some strange reason, there is also a symlink /usr/local/carbon -> /usr/local/bin/carbon.

There is no symlink from /usr/local/... into /usr (or vice versa).

The problem is that the symlink /usr/local/carbon (which points to /usr/local/bin/carbon and eventually to /usr/local/lib/carbon/carbon-busybox) could use the same path-relative traversal as /usr/bin/carbon and accidentally find /usr/lib/carbon/carbon-busybox. The test is checking that it does not do this.

While we don't really intend symlinks to exist in this way, the goal is to avoid confusing behavior if it does occur.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds like you're describing the behavior of RejectSymlinkInUnrelatedInstall, not StopSearchAtFirstSymlinkWithRelativeBusybox?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, for some reason I thought that's where this comment thread was attached.

Because the new logic stops as soon as we see an install-relative busybox, we don't continue to follow symlinks when they, for example, link into a CAS directory tree or some other part of the Bazel output tree. This is what lets us use actual symlinks inside the prefix_root/... file group even though Bazel fully resolves them rather than leave the relative symlink structure in tact: once we see the correctly shaped install tree, we stop traversing symlinks.

There is some explanation about this in the PR description, but let me know if I should add comments here as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, so the point is that even though in this case further traversal will still yield a correctly-shaped install tree, the non-traversal is important because of other cases where that's not true? If so, I think it'd be helpful to explain that in a comment here.

(It would be even better for the symlinks to point to something that's not a correctly-shaped install, so the test code directly illustrates the kind of case we're concerned about, but I don't think that's blocking.)

@chandlerc chandlerc requested a review from geoffromer November 26, 2024 20:41
@chandlerc
Copy link
Contributor Author

Thanks, PTAL!

Comment on lines +245 to +246
// Starting from the second install uses the relative busybox rather than
// traversing the symlink further.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, so the point is that even though in this case further traversal will still yield a correctly-shaped install tree, the non-traversal is important because of other cases where that's not true? If so, I think it'd be helpful to explain that in a comment here.

(It would be even better for the symlinks to point to something that's not a correctly-shaped install, so the test code directly illustrates the kind of case we're concerned about, but I don't think that's blocking.)

Comment on lines 27 to 28
// Returns the busybox information, given argv[0]. This primarily handles
// resolving symlinks that point at the busybox.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, tried wording much closer to that.

Looks good, thanks.

And it wasn't clear any of the comments were non-blocking, mostly trying to make progress here as this PR has been in review for a long time now.

Sorry, I'll try to be more explicit about that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation An issue or proposed change to our documentation toolchain
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants