Tracking issue for collections reform part 2 (RFC 509) #19986

aturon · 2014-12-18T19:06:23Z

RFC 509

aturon · 2014-12-18T19:06:27Z

cc @gankro

Gankra · 2014-12-19T18:46:03Z

Gankra · 2014-12-19T18:48:57Z

CC @huonw @alexcrichton @cgaebel @pczarn @csouth3 @gamazeps @gereeter @fcofdez

aturon · 2014-12-19T18:50:33Z

Note: I'm about to post a PR that does the deprecation for Vec methods.

cgaebel · 2014-12-19T18:50:33Z

cc @bfops I think you were working on upgrading the entry api?

csouth3 · 2014-12-19T19:14:30Z

I can take care of resize for Vec.

bfops · 2014-12-20T19:26:38Z

Yeah I'm working on the entry API. Seems to be coming along!

pczarn · 2014-12-21T18:51:57Z

I'll start work on these methods for RingBuf:

Add truncate and resize to RingBuf (not specified, but makes sense to do)
Add swap_pop to RingBuf (not specified, but makes sense to do)

Did you mean swap_remove?

Gankra · 2014-12-21T20:02:13Z

Yes, sorry.

gamazeps · 2014-12-21T23:42:27Z

I'm starting the misc stabilization, should be over in a 2-3 days :)

csouth3 · 2014-12-21T23:49:26Z

So, some of the misc stuff is done in #20053, but I didn't touch Bitv::{get,set}, didn't touch HashMap's and HashSet's iterators (either the documentation, or actually implementing DoubleEndedIterator and ExactSizeIterator for all of the appropriate iterators using the next_back implemented as next strategy noted in the RFC (if this is even what the RFC intends)), and I didn't attempt the tedium of moving std::vec to std::collections::vec, sooo....there's certainly plenty of misc left to do but wanted to give you heads up so your time doesn't end up getting wasted duplicating work :)

gamazeps · 2014-12-21T23:52:04Z

thx :)

pczarn · 2014-12-22T08:50:15Z

Oh, did you mean swap_back_remove and swap_front_remove? What about grow for ringbuf?

Gankra · 2014-12-22T15:35:01Z

@pczarn That's a good point. I'm inclined to say "follow your heart"? Definitely not grow, since that's being deprecated on Vec itself. Maybe. Probably.

bfops · 2014-12-23T03:57:28Z

#20163 mostly upgrades the entry API for HashMap

bluss · 2014-12-24T08:33:25Z

Wait why would you want to impl .next_back for the hashmap iterators? That doesn't make sense.

pczarn · 2014-12-24T09:52:56Z

@bluss Implementing ExactSize requires DoubleEndedIterator. See #19327

bluss · 2014-12-24T10:12:12Z

ExactSizeIterator is a hack I proposed so that .enumerate() could be used back to front on slices. I'm unhappy to see it infect everything. Why impl that trait, and if we want it in general, isn't it time to untie it from double ended iteration?

csouth3 · 2014-12-24T15:49:26Z

Yeah, that is actually straight from the RFC:

Explicitly specify HashMap's iterators to be non-deterministic between iterations.
This would allow e.g. next_back to be implemented as next, reducing code complexity.

The reason is so we can impl ExactSizeIterator (maybe?) as pczarn says (I suppose because we want rposition and len?). Though if so I agree that it would probably be nice to untie it from DoubleEnded if possible since the bound in this case imposes an undesirable restriction (of course that would mean we'd have to find a new home for rposition since that uses next_back().

bluss · 2014-12-24T22:58:30Z

It's not explicitly deciding anything about ExactSizeIterator

csouth3 · 2014-12-24T23:16:13Z

You are correct. To be completely clear, I just speculated that because the referenced PR is the origin of the next_back/next discussion. Ultimately, beyond deciding on that issue, the RFC doesn't detail exactly what to do with that though, so I defer to the powers that be for what action to actually take here.

TimDumol · 2015-01-01T17:29:40Z

I gave a shot at implementing append() and split_off() for DList over here: #20406

Part of collections reform part 1 and 2, rust-lang#18424 and rust-lang#19986 * shrink_to_fit * swap_back_remove * swap_front_remove * truncate * resize

Implements the `append()` and `split_off()` methods proposed by the collections reform part 2 RFC. RFC: rust-lang/rfcs#509 Tracking issue: #19986

cc #19986 r? @gankro

cc #19986

alexcrichton · 2016-02-18T18:53:49Z

I believe this has all since been implemented, so closing.

jooert · 2016-02-18T21:25:50Z

Actually append and split_off have not been implemented for BTreeMap and BTreeSet yet.

alexcrichton · 2016-02-18T22:30:02Z

Ah good point.

xosmig · 2016-04-22T13:50:07Z

I've started working on append and split_off methods for BTreeMap and BTreeSet.

apasel422 · 2016-04-22T14:01:52Z

@xosmig #32466 already covers BTreeMap::append.

xosmig · 2016-04-22T15:50:57Z

@apasel422 OK. Then only split_off.

xosmig · 2016-04-28T07:44:58Z

Hello!
I've faced with problems.
To recalc sizes of trees after split_off using O(log n) operations, it's necessary to know exact size of each subtree.
So, there are some different solutions:

add size field to the LeafNode structure.

I think, benefits are great.

First, it lets us to do very flexible splits and merges with only O(log n) operations.
Now it's possible to merge trees using O(log n) operations, but it it not so interesting without fast split.

Second, some other interesting and usefull functions need it.
For example: kth_element or "get the number of elements between two values". Both with O(log n) operations.

Third, as I understand (but i am not totally sure), it lets us to implement the rope ( https://en.wikipedia.org/wiki/Rope_(data_structure) ) in easy way. Just as BTreeMap<(), V>.

But overhead is notable (4 or 8 bytes for each node, depending on machine word size). I think, it is possible to embed this sizes without memory overhead for 64-bit machines (due to aligning).

Implement split_off with O(log n + "size of the one of the results")

2.1) Split with O(log n) operations and then calculate size of the one of the results.

2.2) There is another algorithm which doesn't use the fast algorithm with the same complexity, but (as I understand) with greater constant.

Implement split_off with O(log n + "size of the smallest one") but with greater constant

Split using O(log n) operations and then calculate sizes of the trees via 2 alternating iterators, but stop when one of the sizes is counted.

Which implementation should I use?

xosmig · 2016-04-29T05:36:05Z

@apasel422, what do you think?

apasel422 · 2016-04-29T12:03:44Z

@xosmig This isn't my area of expertise, but I'd recommend opening a discussion thread on https://internals.rust-lang.org/ or submitting a pull request for whichever option is easiest and CCing @jooert and @gereeter. You might also want to take a look at #26227.

xosmig · 2016-04-29T18:42:22Z

@apasel422 The main question is "Is this overhead okay or not?". So, who can decide it?

gereeter · 2016-04-29T22:08:01Z

First, note that option one can be slightly optimized by only storing composite sized in internal nodes.

I'm not a big fan of option one - I expect split_off to be a fairly rare operation, and slowing down everything else and increasing the memory footprint solely for its benefit does not seem like a good idea. Since we explicitly support large collections (as large as can fit in memory, basically), the size field would have to be a usize, which cannot be combined with other fields. The application to ropes is interesting, but really wants a separate implementation - the BTreeMap interface is totally wrong for the job (though the node module might be useful). kth_element and measuring the length of a range seems useful, but again, I see them as niche use cases that really want a specialized sequence data structure.

I don't have strong feelings between options two and three. I think that a variation on option two, where the choice of which side to count is made based on which edge was descended at the root node, would probably be the best. It doesn't have the overhead of option three, but still wont take long if you split near the edges.

There is also a fourth option, namely dropping support for constant time calls to len(). This is a somewhat more radical change, but it would easily allow for split_off in O(log n) and avoid the question of how to calculate size completely. Moreover, while it might inconvenience users, it is easy for users to wrap a BTreeMap with something that does track length, recovering the behavior for anyone who actually needs accurate lengths. I suspect that there is much higher demand for is_empty than there is for len.

xosmig · 2016-04-30T06:43:43Z

@gereeter
About rope: it can be implemented as BTree with an implicit key. It needs just 2 more usefull methods for BTreeMap: split_nth (splits off the first number of elements) and erase_nth.

xosmig · 2016-04-30T08:51:56Z

@gereeter
"First, note that option one can be slightly optimized by only storing composite sized in internal nodes."
Yes, that's realy great point! There are about (a rough estimate, in fact, on average, more) 6 / 7 leafs. So only 1 / 7 of all nodes should contain 1 extra usize. Also there are about n / (1.5B) = n / 9 nodes on the average, where n is a number of elements in the container. So, only about 1 / 63 n extra usizes. It's only about 1 bit per element on a 64-bit machine and a half bit on a 32-bit machine. It's nothing, isn't it?
Now the size of InternalNode is about 11 * (size_of::<V>() + size_of::<K>()) + 112 on a 64-bit architecture. So, 8 extra bytes would be okay I think.

xosmig · 2016-04-30T09:26:49Z

If I didn't make any blunder, I am assured that each Internal node should contain itselfs subtree size.

xosmig · 2016-04-30T09:40:48Z

There are some average values above. The worst case is: (1 / 5)n nodes and 1 / 6 internal nodes. So (1 / 30)n extra usizes, about 2 extra bits per key on a 64-bit architecture and about 1 extra bit per key on a 32-bit architecture. It's also nothing.

@gereeter

Implement split_off for BTreeMap and BTreeSet (RFC 509) Fixes #19986 and refactors common with append methods. It splits the tree with O(log n) operations and then calculates sizes by traversing the lower one. CC @gereeter

aturon added A-libs B-RFC-approved Blocker: Approved by a merged RFC but not yet implemented. labels Dec 18, 2014

aturon mentioned this issue Dec 18, 2014

Collections Reform Part 2 rust-lang/rfcs#509

Merged

pczarn mentioned this issue Dec 24, 2014

Implement a few methods for RingBuf #20197

Merged

TimDumol mentioned this issue Jan 1, 2015

Add append() and split_off() to DList as per coll. reform. #20406

Merged

bors added a commit that referenced this issue Jan 11, 2015

auto merge of #20406 : TimDumol/rust/dlist-append-split-off, r=Gankro

32ddbb8

Implements the `append()` and `split_off()` methods proposed by the collections reform part 2 RFC. RFC: rust-lang/rfcs#509 Tracking issue: #19986

jooert mentioned this issue Apr 29, 2015

Implement append and split_off for BitSet (RFC 509) #24934

Merged

bors added a commit that referenced this issue May 7, 2015

Auto merge of #24890 - jooert:bitvec-append-split_off, r=alexcrichton

e8b4c84

cc #19986 r? @gankro

bors added a commit that referenced this issue May 11, 2015

Auto merge of #24934 - jooert:bitset-append-split_off, r=Gankro

bef0b4b

cc #19986

alexcrichton added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Aug 11, 2015

apasel422 added the A-collections Area: `std::collection` label Feb 6, 2016

alexcrichton closed this as completed Feb 18, 2016

alexcrichton reopened this Feb 18, 2016

xosmig mentioned this issue May 29, 2016

Implement split_off for BTreeMap and BTreeSet (RFC 509) #33947

Merged

bors closed this as completed in #33947 Jun 2, 2016

nabijaczleweli mentioned this issue Aug 17, 2016

Replace hand-crafted btree_append with stock one when it's stabilised nabijaczleweli/checksums#7

Closed

Tracking issue for collections reform part 2 (RFC 509) #19986

Tracking issue for collections reform part 2 (RFC 509) #19986

Comments

aturon commented Dec 18, 2014

aturon commented Dec 18, 2014

Gankra commented Dec 19, 2014 • edited by apasel422 Loading

TODO:

Easy

Moderate

Gankra commented Dec 19, 2014

aturon commented Dec 19, 2014

cgaebel commented Dec 19, 2014

csouth3 commented Dec 19, 2014

bfops commented Dec 20, 2014

pczarn commented Dec 21, 2014

Gankra commented Dec 21, 2014

gamazeps commented Dec 21, 2014

csouth3 commented Dec 21, 2014

gamazeps commented Dec 21, 2014

pczarn commented Dec 22, 2014

Gankra commented Dec 22, 2014

bfops commented Dec 23, 2014

bluss commented Dec 24, 2014

pczarn commented Dec 24, 2014

bluss commented Dec 24, 2014

csouth3 commented Dec 24, 2014

bluss commented Dec 24, 2014

csouth3 commented Dec 24, 2014

TimDumol commented Jan 1, 2015

alexcrichton commented Feb 18, 2016

jooert commented Feb 18, 2016

alexcrichton commented Feb 18, 2016

xosmig commented Apr 22, 2016 • edited Loading

apasel422 commented Apr 22, 2016

xosmig commented Apr 22, 2016 • edited Loading

xosmig commented Apr 28, 2016 • edited Loading

xosmig commented Apr 29, 2016

apasel422 commented Apr 29, 2016

xosmig commented Apr 29, 2016 • edited Loading

gereeter commented Apr 29, 2016

xosmig commented Apr 30, 2016 • edited Loading

xosmig commented Apr 30, 2016 • edited Loading

xosmig commented Apr 30, 2016

xosmig commented Apr 30, 2016 • edited Loading

Gankra commented Dec 19, 2014 •

edited by apasel422

Loading

xosmig commented Apr 22, 2016 •

edited

Loading

xosmig commented Apr 22, 2016 •

edited

Loading

xosmig commented Apr 28, 2016 •

edited

Loading

xosmig commented Apr 29, 2016 •

edited

Loading

xosmig commented Apr 30, 2016 •

edited

Loading

xosmig commented Apr 30, 2016 •

edited

Loading

xosmig commented Apr 30, 2016 •

edited

Loading