Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue for collections reform part 2 (RFC 509) #19986

Closed
aturon opened this issue Dec 18, 2014 · 55 comments
Closed

Tracking issue for collections reform part 2 (RFC 509) #19986

aturon opened this issue Dec 18, 2014 · 55 comments
Labels
A-collections Area: `std::collection` B-RFC-approved Blocker: Approved by a merged RFC but not yet implemented. E-easy Call for participation: Easy difficulty. Experience needed to fix: Not much. Good first issue. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@aturon
Copy link
Member

aturon commented Dec 18, 2014

RFC 509

@aturon aturon added A-libs B-RFC-approved Blocker: Approved by a merged RFC but not yet implemented. labels Dec 18, 2014
@aturon
Copy link
Member Author

aturon commented Dec 18, 2014

cc @gankro

@Gankra
Copy link
Contributor

Gankra commented Dec 19, 2014

TODO:

Easy

Moderate

  • add append to:
    • DList
    • Vec
    • RingBuf
    • BTreeMap
    • BTreeSet
    • Bitv
    • BitvSet
    • VecMap
  • add split_off:
    • DList
    • Vec
    • RingBuf
    • BTreeMap
    • BTreeSet
    • Bitv
    • BitvSet
    • VecMap
  • Upgrade the Entry API - has PR
  • Add truncate and resize to RingBuf (not specified, but makes sense to do) - has PR
  • Add swap_pop to RingBuf (not specified, but makes sense to do) - has PR

@Gankra
Copy link
Contributor

Gankra commented Dec 19, 2014

@aturon
Copy link
Member Author

aturon commented Dec 19, 2014

Note: I'm about to post a PR that does the deprecation for Vec methods.

@cgaebel
Copy link
Contributor

cgaebel commented Dec 19, 2014

cc @bfops I think you were working on upgrading the entry api?

@csouth3
Copy link
Contributor

csouth3 commented Dec 19, 2014

I can take care of resize for Vec.

@bfops
Copy link
Contributor

bfops commented Dec 20, 2014

Yeah I'm working on the entry API. Seems to be coming along!

@pczarn
Copy link
Contributor

pczarn commented Dec 21, 2014

I'll start work on these methods for RingBuf:

Add truncate and resize to RingBuf (not specified, but makes sense to do)
Add swap_pop to RingBuf (not specified, but makes sense to do)

Did you mean swap_remove?

@Gankra
Copy link
Contributor

Gankra commented Dec 21, 2014

Yes, sorry.

@gamazeps
Copy link
Contributor

I'm starting the misc stabilization, should be over in a 2-3 days :)

@csouth3
Copy link
Contributor

csouth3 commented Dec 21, 2014

So, some of the misc stuff is done in #20053, but I didn't touch Bitv::{get,set}, didn't touch HashMap's and HashSet's iterators (either the documentation, or actually implementing DoubleEndedIterator and ExactSizeIterator for all of the appropriate iterators using the next_back implemented as next strategy noted in the RFC (if this is even what the RFC intends)), and I didn't attempt the tedium of moving std::vec to std::collections::vec, sooo....there's certainly plenty of misc left to do but wanted to give you heads up so your time doesn't end up getting wasted duplicating work :)

@gamazeps
Copy link
Contributor

thx :)

@pczarn
Copy link
Contributor

pczarn commented Dec 22, 2014

Oh, did you mean swap_back_remove and swap_front_remove? What about grow for ringbuf?

@Gankra
Copy link
Contributor

Gankra commented Dec 22, 2014

@pczarn That's a good point. I'm inclined to say "follow your heart"? Definitely not grow, since that's being deprecated on Vec itself. Maybe. Probably.

@bfops
Copy link
Contributor

bfops commented Dec 23, 2014

#20163 mostly upgrades the entry API for HashMap

@bluss
Copy link
Member

bluss commented Dec 24, 2014

Wait why would you want to impl .next_back for the hashmap iterators? That doesn't make sense.

@pczarn
Copy link
Contributor

pczarn commented Dec 24, 2014

@bluss Implementing ExactSize requires DoubleEndedIterator. See #19327

@bluss
Copy link
Member

bluss commented Dec 24, 2014

ExactSizeIterator is a hack I proposed so that .enumerate() could be used back to front on slices. I'm unhappy to see it infect everything. Why impl that trait, and if we want it in general, isn't it time to untie it from double ended iteration?

@csouth3
Copy link
Contributor

csouth3 commented Dec 24, 2014

Yeah, that is actually straight from the RFC:

Explicitly specify HashMap's iterators to be non-deterministic between iterations.
This would allow e.g. next_back to be implemented as next, reducing code complexity.

The reason is so we can impl ExactSizeIterator (maybe?) as pczarn says (I suppose because we want rposition and len?). Though if so I agree that it would probably be nice to untie it from DoubleEnded if possible since the bound in this case imposes an undesirable restriction (of course that would mean we'd have to find a new home for rposition since that uses next_back().

@bluss
Copy link
Member

bluss commented Dec 24, 2014

It's not explicitly deciding anything about ExactSizeIterator

@csouth3
Copy link
Contributor

csouth3 commented Dec 24, 2014

You are correct. To be completely clear, I just speculated that because the referenced PR is the origin of the next_back/next discussion. Ultimately, beyond deciding on that issue, the RFC doesn't detail exactly what to do with that though, so I defer to the powers that be for what action to actually take here.

@TimDumol
Copy link
Contributor

TimDumol commented Jan 1, 2015

I gave a shot at implementing append() and split_off() for DList over here: #20406

alexcrichton added a commit to alexcrichton/rust that referenced this issue Jan 6, 2015
Part of collections reform part 1 and 2, rust-lang#18424 and rust-lang#19986

* shrink_to_fit
* swap_back_remove
* swap_front_remove
* truncate
* resize
bors added a commit that referenced this issue Jan 11, 2015
Implements the `append()` and `split_off()` methods proposed by the collections reform part 2 RFC.

RFC: rust-lang/rfcs#509
Tracking issue: #19986
@alexcrichton alexcrichton added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Aug 11, 2015
@apasel422 apasel422 added the A-collections Area: `std::collection` label Feb 6, 2016
@alexcrichton
Copy link
Member

I believe this has all since been implemented, so closing.

@jooert
Copy link
Contributor

jooert commented Feb 18, 2016

Actually append and split_off have not been implemented for BTreeMap and BTreeSet yet.

@alexcrichton
Copy link
Member

Ah good point.

@alexcrichton alexcrichton reopened this Feb 18, 2016
@xosmig
Copy link
Contributor

xosmig commented Apr 22, 2016

I've started working on append and split_off methods for BTreeMap and BTreeSet.

@apasel422
Copy link
Contributor

@xosmig #32466 already covers BTreeMap::append.

@xosmig
Copy link
Contributor

xosmig commented Apr 22, 2016

@apasel422 OK. Then only split_off.

@xosmig
Copy link
Contributor

xosmig commented Apr 28, 2016

Hello!
I've faced with problems.
To recalc sizes of trees after split_off using O(log n) operations, it's necessary to know exact size of each subtree.
So, there are some different solutions:

  1. add size field to the LeafNode structure.

I think, benefits are great.

First, it lets us to do very flexible splits and merges with only O(log n) operations.
Now it's possible to merge trees using O(log n) operations, but it it not so interesting without fast split.

Second, some other interesting and usefull functions need it.
For example: kth_element or "get the number of elements between two values". Both with O(log n) operations.

Third, as I understand (but i am not totally sure), it lets us to implement the rope ( https://en.wikipedia.org/wiki/Rope_(data_structure) ) in easy way. Just as BTreeMap<(), V>.

But overhead is notable (4 or 8 bytes for each node, depending on machine word size). I think, it is possible to embed this sizes without memory overhead for 64-bit machines (due to aligning).

  1. Implement split_off with O(log n + "size of the one of the results")

2.1) Split with O(log n) operations and then calculate size of the one of the results.

2.2) There is another algorithm which doesn't use the fast algorithm with the same complexity, but (as I understand) with greater constant.

  1. Implement split_off with O(log n + "size of the smallest one") but with greater constant

Split using O(log n) operations and then calculate sizes of the trees via 2 alternating iterators, but stop when one of the sizes is counted.

Which implementation should I use?

@xosmig
Copy link
Contributor

xosmig commented Apr 29, 2016

@apasel422, what do you think?

@apasel422
Copy link
Contributor

@xosmig This isn't my area of expertise, but I'd recommend opening a discussion thread on https://internals.rust-lang.org/ or submitting a pull request for whichever option is easiest and CCing @jooert and @gereeter. You might also want to take a look at #26227.

@xosmig
Copy link
Contributor

xosmig commented Apr 29, 2016

@apasel422 The main question is "Is this overhead okay or not?". So, who can decide it?

@gereeter
Copy link
Contributor

First, note that option one can be slightly optimized by only storing composite sized in internal nodes.

I'm not a big fan of option one - I expect split_off to be a fairly rare operation, and slowing down everything else and increasing the memory footprint solely for its benefit does not seem like a good idea. Since we explicitly support large collections (as large as can fit in memory, basically), the size field would have to be a usize, which cannot be combined with other fields. The application to ropes is interesting, but really wants a separate implementation - the BTreeMap interface is totally wrong for the job (though the node module might be useful). kth_element and measuring the length of a range seems useful, but again, I see them as niche use cases that really want a specialized sequence data structure.

I don't have strong feelings between options two and three. I think that a variation on option two, where the choice of which side to count is made based on which edge was descended at the root node, would probably be the best. It doesn't have the overhead of option three, but still wont take long if you split near the edges.

There is also a fourth option, namely dropping support for constant time calls to len(). This is a somewhat more radical change, but it would easily allow for split_off in O(log n) and avoid the question of how to calculate size completely. Moreover, while it might inconvenience users, it is easy for users to wrap a BTreeMap with something that does track length, recovering the behavior for anyone who actually needs accurate lengths. I suspect that there is much higher demand for is_empty than there is for len.

@xosmig
Copy link
Contributor

xosmig commented Apr 30, 2016

@gereeter
About rope: it can be implemented as BTree with an implicit key. It needs just 2 more usefull methods for BTreeMap: split_nth (splits off the first number of elements) and erase_nth.

@xosmig
Copy link
Contributor

xosmig commented Apr 30, 2016

@gereeter
"First, note that option one can be slightly optimized by only storing composite sized in internal nodes."
Yes, that's realy great point! There are about (a rough estimate, in fact, on average, more) 6 / 7 leafs. So only 1 / 7 of all nodes should contain 1 extra usize. Also there are about n / (1.5B) = n / 9 nodes on the average, where n is a number of elements in the container. So, only about 1 / 63 n extra usizes. It's only about 1 bit per element on a 64-bit machine and a half bit on a 32-bit machine. It's nothing, isn't it?
Now the size of InternalNode is about 11 * (size_of::<V>() + size_of::<K>()) + 112 on a 64-bit architecture. So, 8 extra bytes would be okay I think.

@xosmig
Copy link
Contributor

xosmig commented Apr 30, 2016

If I didn't make any blunder, I am assured that each Internal node should contain itselfs subtree size.

@xosmig
Copy link
Contributor

xosmig commented Apr 30, 2016

There are some average values above. The worst case is: (1 / 5)n nodes and 1 / 6 internal nodes. So (1 / 30)n extra usizes, about 2 extra bits per key on a 64-bit architecture and about 1 extra bit per key on a 32-bit architecture. It's also nothing.

bors added a commit that referenced this issue Jun 2, 2016
Implement split_off for BTreeMap and BTreeSet (RFC 509)

Fixes #19986 and refactors common with append methods.
It splits the tree with O(log n) operations and then calculates sizes by traversing the lower one.

CC @gereeter
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-collections Area: `std::collection` B-RFC-approved Blocker: Approved by a merged RFC but not yet implemented. E-easy Call for participation: Easy difficulty. Experience needed to fix: Not much. Good first issue. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests