-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiset semantics for Coins and DecCoins #11223
Comments
I don't think multiset is the right abstraction. It is really a map-like structure constructed with an array. You wouldn't model 1000atoms as 1000 atom instances in a set which is what multiset semantics would imply. An appropriate data structure for any non trivial operations would be an ordered tree IMHO. With generics we can probably reuse the same logic for Int and Dec instances. |
Multiset semantics don't commit you to a particular representation. A multiset is sometimes written as a set with repeating elements, but it's also written as a non-repeating set where each element has a nonzero numeric multiplicity attached - which is pretty much what current I'm contrasting "multiset semantics" with other interpretations of an array of If we can consistently normalize to sorted, non-duplicated denoms, arrays seem simpler and more performant than trees - both have logarithmic lookup, etc. But as long as we settle the semantics, I truly don't much care about the representation. I'm curious about whether Go generics will be any easier than simply duplicating the code. In the early days of Go, I talked to Rob Pike about the lack of generics. He had spoken with the Java folks who told him about the immense pain of retrofitting generics onto a deployed language. Rob's takeaway was that generics should therefore be put off as long as possible. I would have thought that the lesson was to build them in from the beginning.. |
Generics are orthogonal to this point.
My understanding is that set membership is defined by element identity, not property. For example, a set of integers doesn't contain all possible integers, it contains only those specific integers which are in the set. Right? So it doesn't make sense to me that a Coins set would contain all possible denominations with a default "value" of zero. That's not a set! That's something else. You can of course define Coins, or any type, to have whatever semantics you want, and if this is the most useful model, then so be it. But AFAICT the thing you're describing here is not a set, or a multi-set, so it's confusing to use terminology from that domain. |
@peterbourgon please have a closer look a the Wikipedia entry for multiset, which describes that a multiset over a universe U of possible values can be equivalently thought of as:
There's a whole range of possible representations of these concepts in a programming language: arrays, hash maps, etc, and For your set-of-integers example, a set is completely defined by its indicator function f, which in this case would mapping from the entire set of integers to the set {0, 1}, where f(n) = 1 means n is in the set and f(n) = 0 means it's not in the set. So the indicator function, which represents the set, is indeed defined on all possible integers, and it returns the default value of zero for any integer that's not in your set. The multiplicity function above is a generalization of an indicator function, where the range is {0, 1, 2, 3, ...}. Now you certainly could define a collection of coins that makes a distinction between a zero-quantity entry and a missing entry - but that doesn't seem to be what I really am sorry to drag set theory into this - it's just that the behavior of |
Given two sets |
I'm definitely not going to overlook your discomfort with this, because you're demonstrating that my explanations need to be better! Let me try to anchor this in a use case rather than set theory. Suppose we have a rule at the bank that when you ask for a withdrawal of a given amount, and you don't have sufficient funds in your account, we give you as much money from your account as we can rather than refusing the request. Let's say you have four different accounts, each holding a single currency:
In each case, the withdrawal result is the minimum of the balance and the request. Now what would happen if instead we had all the same money in a single account, and requested all the withdrawals in a single request. Should we get the same aggregate result? You bet we should!
I contend that:
Does that make sense? |
I understand and agree with most of your assertions about this example. But I don't think Min is the name for this operation. And I think it may boil down to this:
I don't think I agree with this, because an element of a set with value of zero isn't the same as that element not being in the set. You can ask a Coins how many USD it has, and it can respond zero when it doesn't have a USD denom represented within itself, but that's not the same thing. edit: Also, the set of all possible denominations is unbounded and unknowable (in general)! Min on a Coins is, to me, an operation which produces a union set of all represented denominations among all input sets, where each denomination's value is the lowest among all input values for that denomination.
|
Since
...and so on. So the idea that a missing denom in I can imagine a different type that allowed
@aaronc The above discussion, along with @robert-zaremba 's comments in #11200, are why I think this issue is independent of the Pulsar work. Pulsar, as I understand it, will let you build many different kinds of data types, but does not decide for you what semantics your data type should have. This can be settled now in the existing |
I'm just arguing for a different name for the operation. Something like e.g. edit: related |
I'm very in favor of multiset semantics for all the comparison operators. I can never use the For Min, and Max, I think they are better served by I don't think there should be a function named |
Agree. But I don't know better and short name.
This operations are not intersection nor union. |
They are intersection and the union respectively as is commonly defined on multisets? Whats the behavior you'd expect out of intersect / union, thats not being done in @JimLarson 's PR? |
Guys, the naming was discussed at length in the PR, and ultimately it got multiple approvals from core developers and was committed. I and others have since been writing code using the chosen name and semantics. This progress doesn't prevent us from revisiting the issue if necessary, but it does raise the bar for further discussion. If you have an objection, I request that it be grounded in an actual use case, and with a proposal for something better. As I mentioned above, I think the naming objections will subside if we can get some design clarity on I've got a work-in-progress overhaul of
Every two-
For ordering, I wrote utilities
This changes the current behavior for the I've got some other urgent priorities right now, but I'll try to get back to this soon. In the meantime, I'll probably submit a tiny change to keep |
The Min and Max have a very good doc string.
|
What you described as I agree that the definition of union is surprising, which is another reason I prefer |
Summary
Cleans up potentially dangerous inconsistencies in
Coins
andDecCoins
by establishing a multiset model for semantics and a normalization policy.Problem Definition
Coins
(andDecCoins
) are sometimes vague and surprising in their semantics.IsEqual()
returns false when comparingCoins
of different lengths, but panics when it encountersCoins
of the same length but having a difference in denoms. This appears to be intentional, as there are unit tests for the behavior. There doesn't seem to be a sensible use case for these semantics.The call
a.IsAnyGT(b)
sounds like it ought to be equivalent to!b.IsAllLTE(a)
, but it's something quite different, and deliberately so. For instance, the doc comment says{2A, 3B}.IsAnyGT{5C} = false
, despite the fact that the 2 of denomA
is greater than the zero amount ofA
in{5C}
. Similarly forIsAnyGTE()
.There is inconsistency in canonicalization of
Coins
values.Negative quantities are forbidden by
Validate()
but returned bySafeSub()
.Add()
promises not to return any negative quantities but has no checks to enforce this.Some methods explicitly handle non-valid values, others panic or return wrong results. For the empty
Coins
value,NewCoins()
uses the empty slice, butAdd()
andSub()
usenil
. Most uses are happy with either, but some tests require one specifically.There's much hand-wringing about the existence of zero-quantity
Coin
values but the non-existence of zero quanties in the representation ofCoins
values. There's question of the validity of an emptyCoins
value. See Write ADR for Coins #7046 and its references.This all seems to stem from confusion of the implementation of
Coins
with the semantics it provides, the lack of clear definition of those semantics, and the absence of a policy of normalization.Proposal
Multiset model
Each
Coins
value should be abstractly thought of as a collection of coins of all possible denominations. This makesCoins
a multiset, which suggests some operations and their proper behavior.A multiset has an obvious partial order based on containment:
A <= B
if and only if for all denomsD
,A.AmountOf(D) <= B.AmountOf(D)
. Note that this is a partial order, so it's possible for neitherA <= B
norB <= A
.The partial order gives rise to well-known mathematical structure. See #10995 which adds
Min()
andMax()
operations, which are indispensible in various vesting account calculations.If the set of denoms was stable, the SDK could have defined
Coins
as a fixed-length vector ofInt
, with each denom getting a well-known offset. However, since we expect a large and dynamic set of denoms, with mostCoins
values having only a few of these nonzero, then the current "sparse" representation makes more sense. But the semantics should be the same for both.This interpretation settles the hand-wringing over the possibility of zero-quantity denoms in
Coin
but not inCoins
. It confuses the sparse implementation ofCoins
with the data it provides: you see the zero if you ask withAmountOf()
.Normalization policy
The Coins representation as an array of
(denom, amount)
pairs makes sense, and allows direct representation in proto-derived structs. But it also allows representation of some values that do not match up with the abstract model ofCoins
:When the representation is created with the
NewCoins()
constructor, all of these are checked for, but protos generate the representation directly, as do some tests and an unknown amount of user code.Nothing can be done about invalid denoms or negative quantities.
Sort()
will handle order, and a trip throughNewCoins()
will handle both order and zero quantities. Nothing fixes duplicates at the moment, but we could potentially consolidate them by adding the quantities together. It's not clear whether reforming such content would fix bugs or merely obscure them.Methods vary in their support for invalid representations.
IsZero()
seems to exist (vsEmpty()
) only for zero quantities.IsEqual()
sorts its inputs just to be sure, whileAmountOf()
silently relies on its inputs being ordered. Application code can't tell what to expect without reading each implementation. This is a mess.The "highbrow" approach to regularizing this would be to create a
ValidCoins
type with all of the operations, leavingCoins
as the type of the proto field, having only a methodCoins.Validate() (ValidCoins, err)
to canonicalize the coins.A more middle-brow approach would be to simply let most methods assume valid input, but require all to produce valid output. Application logic should explcitly normalize values coming from "outside", e.g. transactions, but can consider the store to only hold valid values.
DecCoin
DecCoins
is mostly a clone ofCoins
withDec
instead ofInt
for quantities, but similarly requires positive quantities for valid values. Thanks to Golang's impoverished type system, we need to duplicate the implementation.All of the above concerns and suggestions seem to apply.
Specific Changes
Let me sum this all up in the following specific proposals.
Fix
IsEqual()
to not panic. Though technically an incompatible change, it's hard to imagine anyone relying on the current behavior.Rewrite
IsAnyGT{E}()
in terms of!IsAllLT{E}()
and change to multiset semantics. Again, an incompatible change, but I think it's likely that the usages actually expect the multiset behavior.Deprecate or remove
DenomsSubsetOf()
since the very question doesn't make sense in multiset semantics.SafeSub
should return empty coins ifhasNeg
is true, to prevent negative-quantityCoins
from leaking out. A quick review of the code suggests that the returned coins value is never used if thehasNeg
field is set. (Except for one case in a test, can probably be rewritten.)All methods should handle
nil
and empty slice the same way. E.g. always test withlen(coins) == 0
instead ofcoins == nil
. Code (including tests) should not try to distinguish between the two.Nevertheless, pick
nil
as the canonical representation of the emptyCoins
value since it's the "zero value" dictated by the Go language.Application code and tests should not create coins as arrays but should instead use NewCoins.
Expect validation / canoncicalization at the boundary (via
NewCoins(dirtyCoinArray...)
or an equivalent alias) and don't do checks or conversions in other methods. Possibly have a transitional period where all methods check for invalid input and panic.Equivalent changes in
DecCoins
.For Admin Use
The text was updated successfully, but these errors were encountered: