-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
make zip for >2 arguments about 20x faster
I realized the only difference between Zip2 and general Zip should be a single `...` token.
- Loading branch information
1 parent
34dbb0f
commit eaf5a95
Showing
1 changed file
with
18 additions
and
28 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
eaf5a95
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really still need
Zip2
after this?eaf5a95
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, they differ by a
...
, so as far as I can tell, yes. It's also kind of reasonable to only haveZip2
, and dofor (a,(b,c)) in zip(x,zip(y,z))
, butzip(x,y,z)
is now a bit faster than that.Zip2
is also currently still much faster thanZip
; it usually gets fully inlined.eaf5a95
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Jeff, since you're working on iterators: do you understand why
zip
andenumerate
are quite a lot slower than writing the same operation out manually?References:
http://stackoverflow.com/questions/27577162/is-it-good-practice-to-use-counters/27578686#27578686
#9080
eaf5a95
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My first guess would be to try adding
@inline
to Enumerate'snext
method.eaf5a95
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, but no dice. Still 2x slower.
eaf5a95
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is interesting! Looking at the generated IR, it looks like LLVM was not able to fully scalar-convert the tuples that Enumerate uses. There is some extra unnecessary shuffling to form and unpack 2-element vectors using vpextrq/vpunpcklqdq.
eaf5a95
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm looking at this and wondering why the original implementation was slow, given #26765. Is it the type instability in done? Could be solved by using an recursive version of
all
.eaf5a95
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like