Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support find_first, find_last, position_first, and position_last #151

Closed
nikomatsakis opened this issue Nov 17, 2016 · 16 comments
Closed

support find_first, find_last, position_first, and position_last #151

nikomatsakis opened this issue Nov 17, 2016 · 16 comments

Comments

@nikomatsakis
Copy link
Member

nikomatsakis commented Nov 17, 2016

We currently have find_any and position_any. These are similar to the find and position methods on Iterator but they don't always find the first match. It'd be nice to support compatible versions. I would want to give them different names so as to discourage people from using them, since find_any will be more efficient.

Implementing this is similar to the existing find_any code but a bit trickier. The way that find_any works is that it makes a consumer with a full() method. full() returns true if execution should stop. We use a single shared atomic and, when a matching item is found, we set this flag to true, signaling others to halt.

To make this work for find_first, we need to halt not when any solution is found, but if a solution has been found in the sections to your left. The easiest way I can think to do this is to have each call to split_off, which breaks the consumer into two pieces, allocate an Arc<AtomicBool> (or something) for those two pieces to share. The left half will signal if it finds something by writing true; the right half will stop when either it finds something, or it sees that true has been written. The reduce method will then act just like it does now, preferring a value from the left half over the right half.

If you've not hacked on Rayon before, also check out the guide to development.

@edre
Copy link
Contributor

edre commented Nov 17, 2016

Instead of a tree of AtomicBool, what about a single AtomicUsize at the root that stores the smallest index found so far. Leaves only have one thing to read instead of log(n) things.

@nikomatsakis
Copy link
Member Author

@edre hmm, so, I thought about it, and for some reason I thought it wasn't as good, but now that I think about it I suspect I was just wrong.

I was concerned about a tree with multiple levels:

  • 0..100
    • 0..50
      • 0..25
      • 25..50
    • 51..100
      • 51..75
      • 75..100

and now we find a match at index 52 first. The main thing I wanted was that we could cut off the search at 75..100 without having to wait for 0..50. But I guess we can do that, no problem, we just set the current minimum to 52 -- this will cause 75..100 to stop, but not the rest.

OK, yes, seems better! =)

@cuviper
Copy link
Member

cuviper commented Nov 17, 2016

However, that only works for indexed iterators.

@edre
Copy link
Contributor

edre commented Nov 17, 2016

You could keep the same approach for unindexed iterators by generating a fake index. Set the next most significant byte on the right half at each split. Assumes splitting is balanced enough to not run out of bits in a usize.

  • 0..100 index 0b0000
    • 0..50 index 0b0000
      • 0..25 index 0b0000
      • 25..50 index 0b0010
    • 51..100 index 0b1000
      • 51..75 index 0b1000
      • 75..100 index 0b1100

@cuviper
Copy link
Member

cuviper commented Nov 17, 2016

One possibility is to fake an index on the splits, something like this on 32-bit:

  • 0x80000000 (0..100)
    • 0x40000000 (0..50)
      • 0x20000000 (0..25)
      • 0x60000000 (25..50)
    • 0xC0000000 (50..100)
      • 0xA0000000 (50..75)
      • 0xE0000000 (75..100)

@edre
Copy link
Contributor

edre commented Nov 17, 2016

Great minds think alike ;)

@nikomatsakis
Copy link
Member Author

nikomatsakis commented Nov 17, 2016

Clever. Use a u64, not a usize -- I guess past a certain point we can just max out anyway, right? (i.e., I'd prefer not to assume splitting is balanced)

@edre
Copy link
Contributor

edre commented Nov 17, 2016

Do 32 bit platforms support AtomicU64?

If splitting is unbalanced, you can just stop changing the index when you run out of bits, and at the worst leaves close to each other cannot share progress.

@cuviper
Copy link
Member

cuviper commented Nov 17, 2016

Do 32 bit platforms support AtomicU64?

No, and that's not stable yet anyway.

@schuster
Copy link
Contributor

schuster commented Dec 8, 2016

Anyone mind if I self-assign this? Looks like a good item to work on for me to learn Rayon, and I have a start of an implementation.

Admittedly, this might be blocked on AtomicU64 becoming stable, if that's definitely what's desired for unindexed iterators.

@cuviper
Copy link
Member

cuviper commented Dec 8, 2016

@schuster go for it! I think GitHub will only formally assign to project members, but consider it yours.

Just stick with AtomicUsize for now, especially if you can make sure it still works when out of bits anyway, as @edre suggested.

@schuster
Copy link
Contributor

schuster commented Dec 9, 2016

Okay, I have a basic (to-be-cleaned-up) implementation that assumes the iterator is unindexed, but it sounds like we want to use the actual index in the case of indexed iterators. How can I do that kind of specialization within Rust when the method is declared as part of the general ParallelIterator trait?

I've just started to learn about impl specialization - is that the way to go?

@cuviper
Copy link
Member

cuviper commented Dec 9, 2016

We can't use specialization until that's stabilized in Rust. We do some "fake" specialization in collect to regain performance, but that's a bit fragile, not the kind of thing we should do a lot.

In this case, I think the only advantage of indexed iterators is that it would be easier to split perfectly. Once you've written the unindexed way, it shouldn't make much difference. I expect the logic of that unindexed mode to be a bit tricky, but the performance impact is probably not enough to be a concern.

@schuster
Copy link
Contributor

Thanks, that works.

Also, I see three possible ways to write the find_first/find_last code, and I'm curious what might be preferred:

  1. Write two completely separate modules, with some repeated code.
  2. Pass around a "direction" value representing first/last, and branch based on it whenever the first or last code needs to work differently.
  3. Use the polymorphism approach by implementing a trait on the direction value with methods for each bit of specialized code.

Option 1 of course repeats code (mostly boilerplate), while 2 and 3 each introduce different kinds of overhead. My hunch is that option 2 is the best in a Rust-based setting (because option 3 would have to jump to an arbitrary code pointer and probably impede various optimizations), but I'm still getting a good intuition for what overhead is acceptable or not in this kind of program. Thoughts?

@cuviper
Copy link
Member

cuviper commented Dec 13, 2016

Option 2 seems simplest -- I'd see how that goes.

Option 3 shouldn't result in function pointers unless you force it through trait objects. Plain generic code should still monomorphize to statically-dispatched function calls, if not inlined entirely.

Generally speaking, try not to worry about fine-grained performance until you're sure it's a problem. Write clean working code first, then see where the performance bottlenecks are.

@schuster
Copy link
Contributor

Good advice, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants