-
-
Notifications
You must be signed in to change notification settings - Fork 220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding support for cross compilation, CUDA, and sanitizers #90
Comments
Hi! I'm not a maintainer of this repo but I did write #85 and I'd love to know more about the changes you made to support cross compilation and/or to see your fork if you're willing to make it public. There are some definite downsides to the approach in #85 and I'm very curious to know how you handle target triples and Bazel platforms and target specific sysroots and some of the other cross compilation pain points. Also I'm not sure if I'm understanding correctly; did you also add support for |
This is a combination of things I mentioned in bazel-contrib#90 and hacks for things specific to our environment. Putting it all in a commit to start a discussion, I'm going to split out the pieces that make sense later.
I threw up my fork in 8b06fed. As mentioned in the commit message, it's a combination of the things I mentioned here and some hacks that don't make sense to contribute. Looking through #85, I think my handling of sysroots is pretty similar (use a separate one for each target, specified via a map). I haven't had any trouble with target triples. I only have a single toolchain for each Yes, I did add aarch64 support. It's almost line-for-line identical to #79, so I didn't include it in the list. I did a few refactorings to reduce duplication, but in hindsight I'm not sure those helped maintainability so I'll look once #79 is merged. |
Thanks! Some misc notes:
edit: sorry, not sure why GitHub decided to expand out those issue links 😕 (edit: oh, it's because they're in bullet points) |
That's neat, and I agree it's somewhat excessive. I think it's unlikely that a toolchain for some untested platform will work without tweaking, so I'd rather prioritize making it easy to get working vs making it possible for somebody to try and end up with a not-quite-working toolchain.
Agreed. WASI seems pretty reasonable. For Linux I don't think you can provide one just for the compiler version. Among other things, the sysroot determines the libc version, which can be different for different environments with the same toolchain.
ubsan has lots of compile-time choices around what exactly to enable. asan is kind of tricky to support because you can't use precompiled shared libraries for the most part. msan is even trickier to support because you also need a rebuilt C++ standard library. I suspect I can't think of many references to using sanitizers with bazel. I think it requires making enough decisions to set up that most people don't. rules_go has a reference to msan in features, but it's as deprecated functionality. I also don't trigger that codepath in my usage of rules_go for some reason... Broadly, I like having the whole toolchain in one function for easier modification to add special things for an environment. However, that does make it a lot less flexible and require additional work to support new features/targets/etc. I think it's a breadth (support lots of targets with basic toolchains) vs depth (support a few targets with fancy toolchains) tradeoff. I'm not sure which approach makes the most sense for this project, and I'm happy to clean things up a bit and declare my fork permanent if #75 is the way for this project to go. That's the core of my reason for creating this issue because I'm really not sure what the best answer is.
The main one is not having an additional toolchain. I don't want any of the auto-configuration, and I want everything to be hermetic, so I'd need to carefully look through Also, I like passing |
My thinking was that it's still useful to have just a "working"-ish compiler for some bare metal targets (i.e. embedded) but this seems questionable the more I think about it. I still do think having that kind of mapping logic isn't a bad way to deal with small target triple differences (i.e.
By different libc versions do you mean different implementations; i.e. For the former, I've seen that included in target triples sometimes; i.e. For the latter, I'm not sure what we as a toolchain should do about this. Up to now (i.e. when not cross compiling) this toolchain has just used whatever libc/etc is present on the host machine IIUC. I'm tempted to say this is somewhat out of scope for this repo (users who need specific sysroots can provide one; we'll provide what we think is good for the common use case) but I do recognize that this is a real use case. I'm also not very confident about being able to find a good sysroot to use for all targets (WASI has one, macOS targets do too, but it's definitely trickier for i.e. linux targets; Couple of unrelated things:
Having thought about this a bit more I don't think it's a big deal at all to add "new" features and I think ^ is a compelling argument for why you really do want this stuff to live in a toolchain and not just hacked into some
I think this is a good way to frame it.
I'm personally more interested in supporting a wide range of targets at the moment but providing ASAN/TSAN/UBSAN is very compelling. These two things don't seem fundamentally incompatible; I think the only real argument for #75 is that leaning on It's obviously not up to me – I am not a maintainer of this repo – but I'm personally leaning towards redoing #75 to instead update (other people should definitely weigh in too)
Is adding Flags like these living in a toolchain make me a little nervous but I admittedly don't have a good understanding of the space. |
I've been thinking about doing this too, but I think it makes more sense as a separate repository rule, possible in a separate project. It could share some of the LLVM download infrastructure, but I'm not sure if it's enough to live in this project. Bare metal toolchains are set up differently from OS ones. There aren't any kernel headers, it's
I've run into problems with different versions of Debian and Ubuntu having different versions of glibc. Those are about as similar as two operating systems can get, but there are still situations where compiling against a newer glibc uses symbol versions that an older one doesn't have, or compiling against an older glibc doesn't have a new syscall that you want to use.
I think any default besides dynamically linking to glibc isn't going to work for most people. A minimal sysroot with a relatively old version of glibc might work though.
Statically linking glibc on a GNU/Linux system (I can't speak for other OS) is a pretty weird thing to do. It causes a variety of "fun" problems, and isn't something I think is worth supporting. https://stackoverflow.com/a/57478728 has some good reasons, for example. Also, I don't see a use case for it. The C++ static library has very contained forward dependencies (just libc and the compiler support library) and poor compatibility among reverse dependencies (all dependents need to use almost the exact same one, period). glibc is the opposite; its forward dependencies include the kernel, dynamic linker, and a surprising amount of Statically linking musl for a Linux binary is an interesting idea. I've never done that, but it looks approachable.
That's actually really cool, I hadn't read about it before. Thanks for the pointer :) Also, agreed, that this seems out of scope. I've got a little script that rebuilds libc++ with msan that I can probably get permission to contribute, which is the only part I've needed. Running it manually hasn't been a big deal for me.
Agreed.
I was planning to pull |
I actually think it's totally reasonable to have this repo handle bare metal targets too; I think you usually can infer whether to link against libc statically for a particular target triple (i.e. There's definitely more complexity than just that and I think there's probably even less of a consensus about what the "common use case" would be for a lot of the targets in the embedded space but I think there's still value in this repo offering, for example, bare metal toolchains that ship with a sysroot that has a pre-compiled static newlib. But you're right; this is probably a discussion for later and I think regardless there'd be a legitimate want for, i.e. toolchains that "wrap" this one and have the right SDK/compiler flags/retarget-ed libc for a particular board. (as a totally unrelated thing, I think there are a lot of potentially neat things to do in this space! I've wanted to build a ruleset that uses QEMU and provides a
I agree; this seems like a good default. Do the tarballs you're using for
It's definitely pretty weird but it's actually a real thing, I'm pretty sure! 😃 But you're absolutely right; when I said statically linked libc I meant more The one somewhat legitimate use case I know of for actually doing that (for hosted platforms that is) is totally static binaries for
No problem! It seems super compelling because of how much it simplifies distribution and stuff; it lets you get around needing a huge matrix of sysroots. But also, yeah, it is pretty neat (and out of scope). 😛
👍 I don't know if there are problems with doing so but if we end up supporting MSAN and friends it'd be neat to have the script run in CI and push
Ah, I see. I think the main hesitation I have is that it complicates the repo rule a little bit ( I think (afaik) distributing CUDA libraries and friends is a little dicey and something we'd probably need to have users grab themselves? (or maybe not – I know TensorFlow makes you grab them yourself but I'm pretty sure I've seen at least one bazel workspace that does it for you and prints out lots of warnings re: license agreements) Regardless, we should be able to figure something out I think. I'm also pretty sure I've seen a CUDA ruleset that actually patches
Perfect; thanks. |
Yep. They're built by just merging the data tarballs of .deb files Debian/Ubuntu (I've done it with a few different versions at this point).
It's not super fast, but doing it for releases should be doable.
Yep. My current thinking is to document "download deb files and combine them into a tarball", with an example shell script to run manually. Also, I just found directions from NVIDIA on using Bazel, which say to download the equivalent tarballs directly. Neither of those approaches has any kind of license to accept beforehand, which is interesting. Some of them come with licenses in the tarballs, somebody should look through those for anything relevant when we get there. |
With the commits I added today, I have refactored the repo to better support cross-compilation. I also added a test to check cross-compilation from darwin-x86_64 to linux-x86_64, with linkstatic (static linked user libraries but dynamically linked system libraries) and fully_static (all libraries are statically linked). I think I won't be able to come back to this project for a long time. I will update the README with my current thoughts on the state of things. I will give you two maintainer access to the repo. Please feel free to add new features, refactor, patch, etc. I won't be able to give attention to the PRs, so please go ahead and review within yourself and merge if you feel satisfied. |
@rrbutani thoughts on where to take this now? I think it makes sense to start with me pulling out the parts needed for cross compilation on top of the latest changes, and then make a PR with that. |
@brian-brt In the original post you mentioned wanting to be able to cross-compile from k8 to aarch64 and vice versa (for Linux); are there other configurations you wanted support for? AFAIK this repo supports those configurations after #98. Ultimately, I'd like to do something a little more general; I've been meaning to redo #85 on top of the recent changes. That said I don't think I'm going to get to that in the next few days so if there are other configurations you need I think it makes perfect sense to add support for those now. |
I guess getting libc++ from the sysroot works. My approach gets it from the Clang release for the target architecture, which means less things to configure in the sysroot but more things to download. I'll have to think a bit more about whether it's worth supporting both of them. |
I'm interested in sanitizer support. Is this difficult more difficult to add now that bazel-toolchain uses
The default Bazel-generated toolchain on macOS supports |
@jfirebaugh You could submit a Bazel PR to add the same features to the Unix toolchain - I'm pretty sure it would be very welcome. |
@oliverlee Copying and modifying the generated file is the easiest approach I can think of. After you have confirmed your changes work, you can modify the file in a checkout of Bazel and build it with |
There was some discussion here about adding `asan`, `tsan`, and `ubsan` features to the unix toolchains to match macos. bazel-contrib/toolchains_llvm#90 (comment) I've taken my changes local to that project and copied it into Bazel as suggested by @fmeum. I've written some tests but I'm not sure where to place them or if it makes sense to depend on the error messages from asan/tsan/ubsan. Closes #17083. PiperOrigin-RevId: 501213060 Change-Id: I9d973ebe35e4fa2804d2e91df9f700a285f7b404
There was some discussion here about adding `asan`, `tsan`, and `ubsan` features to the unix toolchains to match macos. bazel-contrib/toolchains_llvm#90 (comment) I've taken my changes local to that project and copied it into Bazel as suggested by @fmeum. I've written some tests but I'm not sure where to place them or if it makes sense to depend on the error messages from asan/tsan/ubsan. Closes #17083. PiperOrigin-RevId: 501213060 Change-Id: I9d973ebe35e4fa2804d2e91df9f700a285f7b404 Co-authored-by: Oliver Lee <[email protected]>
There was some discussion here about adding `asan`, `tsan`, and `ubsan` features to the unix toolchains to match macos. bazel-contrib/toolchains_llvm#90 (comment) I've taken my changes local to that project and copied it into Bazel as suggested by @fmeum. I've written some tests but I'm not sure where to place them or if it makes sense to depend on the error messages from asan/tsan/ubsan. Closes #17083. PiperOrigin-RevId: 501213060 Change-Id: I9d973ebe35e4fa2804d2e91df9f700a285f7b404
I think this issue should be split up or closed. |
Closing as per above comment. |
This project has been super helpful for configuring toolchains. I've made some pretty extensive changes, adding support for:
I've been using these for a while, and been happy with them. I'm interested in contributing these features back to this project. I figured I'd open an issue to talk about how you'd like to see them before just creating large PRs.
Looking around, #85 takes a different approach to similar features, which touch all the same code.
Also, All 3 of those things require building up supporting tarballs (sysroots, CUDA SDK, and libcxx respectively). I'm happy to provide directions and/or examples of each along with the code, but they need to be created specific for each target environment. Any preferences on how to approach that?
The text was updated successfully, but these errors were encountered: