-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implicit HeaderMaps #3712
Implicit HeaderMaps #3712
Conversation
Can one of the admins verify this patch? |
/cc @mhlopko First the fundamentals:
Since these are important enough questions, I took the liberty of only skimming over the change, and there were two things that came to mind:
|
Thanks for the feedback @lberki!
First, a HeaderMap is included in the Example:
The compiler resolves conflicts internally. At a higher level, it is a similar issue if a user added conflicting include paths; Say, "Header.h" on disk in 2 different directories and both directories are
HeaderMaps are supported on Apple's GCC Fork. I'm not familiar with MSVC, but it looks like on WinObjc, they have support for HeaderMaps.
I definitely think it should generate them based on For some of my use cases, a custom map is required: I need to specify a custom namespace for headers. The namespace is not exposed to bazel For example, in CocoaPods, headers are included from a given Pod with a namespace.
And on the file system:
The string So for generating maps for CocoaPods or other Apple style libs, I think manually specifying namespaced headers is useful. Doing so at the
Waiting for official Bazel releases isn't a big issue for me. At this point, I'm using custom builds for this and other changes. |
I agree this should be implemented very similarly to module maps - that is, automatically generated from the build rules. For namespacing, it sounds like we could add a hdrs_ns attribute that would set the mapped to namespace for a particularly library? |
How does the compiler find the header map? As far as I can tell, this change doesn't change its command line. What happens if multiple header maps are supplied? Is there a precedence between them? If so, what is it? I agree with @r4nt -- it looks like it's better to compute header maps from information already given to Bazel so that one doesn't run the risk of providing inconsistent information in the header_map build rule and in e.g. How stable are header maps? The links you provided are to Clang internals, which doesn't fill me with confidence that it will remain the same for a reasonable amount of time. |
The header map format hasn't changed in the last 10 years and probably has been stable for a long time before that. However, it is undocumented. It's essentially a hashmap to accelerate file system queries and thus replaces directory lookups with a path string lookup. Header maps are specified with -Ifoo.hmap on the command line and act pretty much the same as specifying a directory. You can have multiple of those flags or mix them with normal -Idir and they will be queried in command line order. Automatically generating header maps to accelerate Clang build times (GCC and MSVC are unlikely to ever implement this) seems worth investigating. Getting the granularity right for that might be tricky though, you cannot represent include order within a single header map but making many small header maps destroys the potential compile time speedup. |
Drive by - Why would you want this? Last time I looked, header maps (which were created by Xcode), they were keyed by the header name only. So if you have |
Thank you kindly for the consideration and feedback! As @d0k mentioned, HeaderMaps have been around for a while. They have been default enabled in production Xcode under Clang and before that Apple GCC. For my iOS app, the traditional approach of header lookup does not scale. I've got a large target and directory of ~1700 headers, a few small internal deps, and ~50 external deps ( CocoaPods ). Total, it puts me at ~4500 headers and respective directories. The poor performance even at this scale is due to the nature of the compiler's header search algorithm. The complexity of linear searching over included directories is the issue here. Build times without this are a non starter even on the best Apple hardware. Anecdotally, as an experiment, I replaced HeaderMaps with regular includes in Xcode and saw a 3x slowdown in Xcode's clean build time. Ultimately, I think having a few maps at most for each compiler invocation ( and no other includes ) will be the fastest. For a given target, I should be able to implicitly derive a headermap map based on I'm going to test a few different implementations to see how much build time I can regain. An implementation that just works would be ideal it's possible. |
One question is whether we should have a header map for compiling each target; that way, we have |
So I've tested out a few different approaches to implementing this in of Bazel and the performance implications. Header maps are created internally in Bazel and implicitly. ObjC Namepace heuristicNamespaced keys allow users to include headers from ObjC targets in the convention of By default headermaps are created with a namespace based on the In Dependency headers and namespaces are added transitively with a From the Performance: Cuts clean build time in halfBy using 2 headermaps the header searches are most efficient: a small Clean build time on base commit:
Clean build time after implicit header maps:
Overall, I think it's a good improvement 🎉 |
I think this is a good change, and makes your imports clear. Can header maps be used with |
@dmishe thanks for the feedback. On this branch, the header search routine will find imports of the form: |
af58828
to
8928f51
Compare
I've updated the patch based on the above discussions: added a couple tests and removed the original rule. The implicit header map logic handles all use cases I could think of. I also wrapped it behind an option cc @mhlopko @lberki @ulfjack Is there any other feedback that you have for this? |
" hdrs = ['headerb.h'],", | ||
")"); | ||
|
||
assertThat(getConfiguredTarget("//x:objc")).isNotNull(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't see a way to actually build //x:objc
here, but that could be good to have it running end to end!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Java tests are not integration tests. You'd need to write an integration test to do an actual build.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ulfjack thanks for the feedback. I'm not too familiar with the integration testing system yet but will look into this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well the easiest would be to take inspiration from our existing integration tests, e.g. https://github.com/bazelbuild/bazel/blob/master/src/test/shell/integration/cpp_test.sh
8928f51
to
eb46c3c
Compare
@mhlopko friendly ping! Would you kindly take another look at this? If there anything you all think can be done here to get this in a mergeable state, I'm happy to discuss 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't have time to finish the review today, so sending at least partial review out. One more note - please do not make it objc specific, it's useful for c++ too. And as you'll see in the comments, pls integrate it with include_prefix and strip_include_prefix.
headerMapsBuilder.add(internalHeaderMap); | ||
|
||
String namespace; | ||
if (ruleContext.attributes().has("header_namespace")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we use include_prefix and strip_include_prefix for this? So we don't need another attribute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And for "this" I mean header_namespace attribute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could take a stab at making it work with the existing include_prefix
and strip_include_prefix
attributes in cc_library
too.
None the less, I think the behavior of header_namespace
is useful for objc_library
and especially objc
that was written under Xcode's conventions.
The current implementation of objc_library
-> header_namespace
has quite different abilities than cc_library
-> include_prefix
/ cc_library
-> strip_include_prefix
. Basically, header_namespace
puts every single header under a flat namespace regardless of that header's location on disk.
Even if objc_library
gained the include_prefix
and strip_include_prefix
attributes, I don't think it'd be trivial to setup includes for the following repo:
LibA
BUILD
A1
SRC1.c
LibB
BUILD
C1
HDR1.h
C2
HDR2.h
D1
HDR3.h
.. continues for tens of levels of headers
File LibA/BUILD
objc_library(
name = "LibA",
srcs = ["A1/SRC1.c"],
deps = ["//LibB:LibB"],
)
File LibA/A1/SRC1.c
// The header file resides at LibB/C1/D1/HDR3.h
#import "B/HDR3.h"
File LibB/BUILD
objc_library(
name = "LibB"
hdrs = ["C1/HDR1.h", "C2/HDR2.h", "C2/D1/HDR3.h"],
# Every single header is available under namespace "B"
# i.e. #include "B/HDR3.h"
header_namespace = "B"
)
When we compile LibA/SRC1.c
we are passed the following module map:
B/HDR1.h -> LibB/C1/HDR1.h
B/HDR2.h -> LibB/C2/HDR2.h
B/HDR2.h -> LibB/C2/D1/HDR2.h
... more
So includes in File SRC1.c
are valid
// The header file resides at LibB/C1/D1/HDR3.h
#include "B/HDR3.h"
In CocoaPods.org
and other iOS development, this include style is widely used
Given the usage of includes in iOS development ( i.e. in Texture ) and Xcode's include behavior I think it makes sense for objc_library
to have the header_namespace
attribute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I argued for something like header_namespace back in the day; I do think you bring good arguments why it provides an easier user experience. Given that, it might make sense to also support this for C++, as well as supporting it both with and without header maps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback @r4nt! For me, in addition to perf benefits, this feature key to making Xcode based source compatible with Bazel in a way that's not to much effort too.
If this is can be upstreamed, then I should be able to add header_namespace
to C++ library as well.
Ideally, getting header_namespace
to work without header maps could be broken out into a separate PR. I'm not exactly sure what that would entail, but it seems doable.
Does that sound like a good plan @mhlopko?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the "flattening" behavior of header_namespace
really something people rely on? It sounds confusing and error prone to me because of header name collisions. In order to implement this without header map (because people might want to use header_namespace even when they don't want header maps) we would have to pass -Idir
for every dir
in the target. I need more convincing that this is good idea @jerrymarino @r4nt :) If we remove "flattening" requirement, then header_namespace behaves exactly like include_prefix. Is there any way to collect more data on how often people rely of the "flattening"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to go into the direction of allowing OS code to work together without changes, it's fundamentally similar to the include_prefix and strip_include_prefix, while being simpler to understand and providing more flexibility, so I'd say it's a clear win.
ImmutableList.Builder<Artifact> headerMapsBuilder = ImmutableList.builder(); | ||
String targetName = ruleContext.getTarget().getName(); | ||
|
||
HeaderMapInfo.Builder internalHeaderMapInfo = new HeaderMapInfo.Builder(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also include virtual headers for include_prefix/strip_include_prefix
depHeaderMapInfo.addHeaders(publicTextualHeaders); | ||
|
||
// Merge all of the header map info from deps. The headers within a given | ||
// target have precedence over over dep headers ( See |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SuperNit: extra space between ( and See.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will Fix
// the working directory ( i.e. exec root ) in this form | ||
// and it must be after including the header map files | ||
contextBuilder.addIncludeDir(PathFragment.create(".")); | ||
ImmutableList headerMaps = headerMapsBuilder.build(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SuperNit: Reformat pls. In general, Bazel follows google java style guide (https://google.github.io/styleguide/javaguide.html). I'll take care of that when importing, but to increase readability pls try to adhere to column limit: 100, extra points for https://google.github.io/styleguide/javaguide.html#s4.5.1-line-wrapping-where-to-break :) But I'm nitting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, thanks for the tip. I'll try to make this follow to the style guide closely :)
public final class ClangHeaderMap { | ||
// Logical representation of a bucket. | ||
// The actual data is stored in the string pool. | ||
private class HMapBucket { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Let's make this comment a javadoc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And consider renaming to HeaderMapBucket?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I'll rename it to HeaderMapBucket
private static final String GUID = "4f407081-1951-40c1-befc-d6b4daff5de3"; | ||
|
||
// C++ header map of the current target | ||
private final Map <String, String> headerMap; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make ImmutableMap
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, will do!
) { | ||
super( | ||
owner, | ||
ImmutableList.<Artifact>builder().build(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replace with ImmutableList.of()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
owner, | ||
ImmutableList.<Artifact>builder().build(), | ||
output, | ||
/*makeExecutable=*/ false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kudos for using param comments! :) Could you add spaces around makeExecutable=?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, will do!
} | ||
|
||
@Override | ||
public DeterministicWriter newDeterministicWriter(ActionExecutionContext ctx) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to use short names here for ctx and b. I'd personally go with headerMap instead of hmap, but that's subjective taste.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion, Sir. I'll write out the actual names.
for(Map.Entry<String, String> entry: headerMap.entrySet()){ | ||
String key = entry.getKey(); | ||
String path = entry.getValue(); | ||
f.addString(key + "->" + path); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to have this -> in the key.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will Remove!
@mhlopko thanks so much for going through this and for the feedback, I greatly appreciate your time! I responded with comments inline. Most of it is straightforward enough. I still think there's good value in having the For What to do you think? |
My take is that header maps should only be a performance trick, and that we should always have a way to at least make it compile with compilers not supporting them, otherwise we generate a confusing user experience. The say we'd do that is very similar to what cc_inc_library does - we'll create a directory symlinking in all the headers and put a -I in for that directory. |
eb46c3c
to
91f275a
Compare
Ok, so I've freed some cycles to work on this and build perf again 🎉 So far I've:
Overall, I can't agree more this rendition is better than I'd like to see this merged because it is making Bazel faster and easier to use for iOS/OSX development - especially for big apps migrated from Xcode, like Pinterest. Is it possible to support the symlink'd version ( non Apple gcc / Clang ) as a followup PR? Doing this incrementally will prevent it from not being part of Bazel and prevent more bitrot. If we've got a plan to move forward, I'll work through the rest of @mhlopko 's comments ASAP. What do you think @ulfjack @mhlopko @r4nt |
I'm also wondering what the plans for this PR are... note that the rebased version is already almost 1,000 commits behind the latest Bazel release. |
One problem that I just noticed with this PR (after rebasing on master and playing with it) is that it doesn't seem to add a way to get at the header map artifact from the |
This is a squashed version of PR bazelbuild#3712 bazelbuild#3712 by Jerry Marino <[email protected]> It adds an objc_library option called flatten_virtual_headers that modifies the behavior of include_prefix such that instead of prepending the path to the header's actual path, the headers are mapped to the include_path namespace. E.g. src/heaaders/foo/bar.h src/headers/bar/baz.h with the options: include_prefix = "FOO" flatten_virtual_headers = True would let you include bar.h and baz.h as #include <FOO/bar.h> #include <FOO/baz.h>
I have some bandwidth to shepherd this through (I created #5587 with a rebased version on top of master and feedback from here incorporated). I don't know where is a good way to follow the discussion though. I'm trying to clean up the code and tighten a few corners, and one of the things that I noticed is that the current version of the code adds both private and public headers to the header maps. For example, this:
Will produce two header maps,
So it seems that the private headers are being exposed in both header maps. Doing this import:
in a dependency fails when used with a sandbox (because even though the header is mapped in the header map file, bazel doesn't copy it to the sandbox). However, if I run the build with I think we want the (Meta note: do we want to continue the conversation here or in #5587? I created a new PR since I can't change the code in this one and I'm actively hacking on it). |
@r4nt I would hold off on the review. I'm finding it hard to wire in the Swift support because of the way HeaderMaps are implemented here. Namely, I think they should be closer to how ModuleMaps are implemented (i.e. remove |
Looped in djasper as he has worked on making c++ header detection faster (we still need to open source include scanning). Once we have include scanning, header maps should nicely fall out, making c++ compiles potentially a whole bit faster. |
Hmm, interesting. Would it make sense to continue working on this PR or would you suggest waiting for @djasper's work to be open sourced? Note that I'm interested in two aspects of this PR, one is the performance gain of using header maps, but the other is the ability to tweak the header namespace. There is a convention in the iOS apps world that headers from the same target can be imported as:
And headers from dependencies are imported as
Regardless of where they are located in the file system. The combination of the |
Oh, I wasn't suggesting this is gated on anything, just that djasper might want to help making sure this works well with his we envision the c++ rules to work. |
Ok, my main worry is that @djasper's work could be rewriting I'm currently modeling header maps pretty much like module maps, cleaning up interfaces and simplifying the classes. It'll probably take me another week or so to have something to show. Is there any place where I can get a sense of what @djasper is doing or what his vision for C++ is? |
@ob another, pragmatic suggestion is to pull this logic out into a skylark rule and try to add headermaps support via |
I have not planned any significant change in any of the regions mentioned. I am very much focussing on the execution phase, i.e. providing the compiler with the right inputs efficiently, improving input discovery and .d file scanning. |
@jerrymarino that's an interesting idea. I'm not sure how that would manage to propagate the header maps through the objective-c dependencies that don't use the wrapper macro though... it seems to me that as long as @djasper Thanks for confirming what you're doing. I'll plow ahead with my changes then (I'm just repackaging the code and cleaning up some corner cases). I'll post back once I'm done. |
@ob with some basic skylark, I think it is possible. I'm testing out a few ideas:
A theoretical skylark rule has a usability tradeoff since it can't automatically work as the current PR does. |
Hi all, |
Well there is not much to work on in rules_cc yet (so far only macros delegating to native rules only). But eventually, yeah :) |
Ok, I'll close this one as well. |
Request for feedback on an implementation of C++ HeaderMaps in Bazel.
A HeaderMap is a data structure that allows the compiler to lookup included
headers in constant time.
Traditionally, the compiler has to parse a large string of
iquote
includes,and then search these directories for a given header. This is slow for many
reasons.
The protocol of HeaderMap is implemented within compilers. Please find the
Lexer implementation in Clang.
https://clang.llvm.org/doxygen/HeaderMapTypes_8h.html
https://clang.llvm.org/doxygen/HeaderMap_8cpp_source.html
Use case:
I'm seeing a massive increase in build performance by using this. It cut my
clean build time in half.
Performance data:
Build time before HeaderMap:
Build time after header maps on the entire project:
Additionally, this solves the problem of having namespaced headers which is used
in CocoaPods all over. Using a namespace makes includes more clear since it is
easier for the user to distinguish where the header was derived.
Implementation:
At the ObjC level, headermaps are created with a namespace of the given target.
In
objc_library
it is possible for the user to override the value of thenamespace via the new attribute,
header_namespace
.By using 2 headermaps the headersearchs are most efficient: a headermap for the
current target, and a header map with namespaced includes.
Users can include headers from ObjC targets in the convention of
Namespace/Header.h
. Projects that don't use namespacing should see benefits aswell: includes of the form
Header.h
will be read from the headermap.HeaderMapInfo
contains all of the transitive info for dependent header maps,and is merged together into a single map. This yields much better performance
than multiple headermaps.
This is my first PR to the Bazel repo, so any suggestions or feedback is greatly
appreciated!