Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fold FixedBoundedFloatStrategy into FloatStrategy #3327

Merged
merged 27 commits into from
May 24, 2022
Merged

Fold FixedBoundedFloatStrategy into FloatStrategy #3327

merged 27 commits into from
May 24, 2022

Conversation

pschanely
Copy link
Contributor

@pschanely pschanely commented May 2, 2022

Closes #2907.

Copy link
Member

@Zac-HD Zac-HD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Phillip! This is a much nicer patch than I was anticipating would be required, so pending the comments below and some passing tests, I'm keen to merge!

(PyCon US sprints are this Mon+Tues, so I might even see if someone wants to do a filter-rewriting PR...)

hypothesis-python/RELEASE.rst Outdated Show resolved Hide resolved
allow_nan: bool,
allow_subnormal: bool,
width: int,
bounds: Optional[Tuple[float, float]] = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make this a min_value and max_value, defaulting to 0.0 and +inf respectively. We can then be much less conditional everywhere else.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also was unsatisfied with the bimodal-ness of this logic.

So, for one, I think we still need the unbounded FloatStrategy to still be able to return negatives. Also, the way we do bounded selection is also based on fractional_float, which I don't think will work with +inf. Or it's possible (likely?) that I'm missing the context for why these aren't issues.

One thing I did try - including the appropriate NASTY_FLOATs when bounds are specified. This sort of works, but requires special handling to avoid -0.0 and a possibly empty list. Then some shrinking tests started failing in a way that made me worry I was causing problems there. Insights appreciated!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The underlying FloatStrategy is only positive IIRC, so st.floats() has to return a union of two Strategy objects if the allowed range spans zero.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears that the unbounded case doesn't do this union. But this is a big function, and I know that I am not following all of it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, right, yeah, unbounded floats is positive-only in conjecture.utils (or conjecture.floats? anyway...), but adds a sign bit before you get to FloatStrategy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. Still happy to refactor as you see fit - LMK!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's push the management of sign-bits-for-bounded-floats down, from st.floats() into FloatsStrategy. The high-level goal here is that FloatsStrategy should hold enough of the arguments to fully represent a call to floats() - so that the solver/fuzzer/filter-rewriting interface will pretty much "wrap the current FloatsStrategy.do_draw()".

Kinda annoying now, sorry, but I think it'll pay off in the medium-term.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not annoying!

Ok, so you might imagine roughly this logic getting incorporated into the FloatStrategy class. (though needs be implemented rather differently since strategy-composition isn't a tactic that would be as readily available) Is that about right?

Backing up, some questions that come to mind:

  1. Would the ideal mid-level interface be identical to st.floats()? (FWIW, I think CrossHair could implement st.floats(), FloatStrategy, or anything in between without much trouble) One could argue that some amount of normalization and consistency checking is better off at the higher level.
  2. does it matter whether that interface is a function or a class? I'm not familiar with the filter-rewriting stuff, but I imagine that gets a lot easier if this interface is a class-level thing, right? And that's why we care?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so you might imagine roughly this logic getting incorporated into the FloatStrategy class. (though needs be implemented rather differently since strategy-composition isn't a tactic that would be as readily available) Is that about right?

Yep, that sounds right to me. Caveats: we should check that this doesn't noticably hurt performance in the interim period

  1. Would the ideal mid-level interface be identical to st.floats()? (FWIW, I think CrossHair could implement st.floats(), FloatStrategy, or anything in between without much trouble) One could argue that some amount of normalization and consistency checking is better off at the higher level.

I think a mid-level interface I would want is "like floats(), but inclusive bounds are required and there's no sign bit or allow_subnormal". So we're really just talking about finite floats in some contiguous range, plus maybe nan (only if inf is allowed). It's tractable to efficiently generate this with bit-twiddling to get close enough to right that at least half of attempts are valid, and that's good enough that rejection sampling is really fast.

It's deeply weird to humans, but this should work really well for fuzzers and pretty well for solvers too. Then FloatsStrategy is responsible for composing positive and negative ranges (if applicable), zero (if subnormals disallowed), etc.

  1. does it matter whether that interface is a function or a class? I'm not familiar with the filter-rewriting stuff, but I imagine that gets a lot easier if this interface is a class-level thing, right? And that's why we care?

The filter rewriting will be invoked via FloatsStrategy.filter(), so I'd build FloatsStrategy.do_draw() directly in terms of the mid-level interface. I'd imagined that the interface is basically "provide an object with these callbacks", which, yeah, is very likely to be an instance of some class with those methods.

@pschanely pschanely marked this pull request as ready for review May 3, 2022 01:46
@pschanely
Copy link
Contributor Author

Apologies for delays - I was travelling this past week. I think it's close; maybe take a peek and see if I got the right idea?

To emulate the strategy-unioning that was previously happening, I've moved the sign bit up front. A few niggly tests are still causing me trouble, though - hints welcome!

Copy link
Member

@Zac-HD Zac-HD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this looks sensible to me at an evening glance! Of what I see, I'm most suspicious of the offset logic; ideally we'd get to a mask-based rather than addition-based solution (see e.g. #2907, though I'd like to do as little as possible of that to get this working and merged).

Comment on lines 195 to 196
min_value: Real = -math.inf,
max_value: Real = math.inf,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should definitely be float annotations - we convert other types in st.floats(), and it's useful to have the exact bounds here.

I'd expect to also need an allow_subnormal flag sometimes? But maybe not yet...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK on the float annotations.

allow_subnormal could be part of the mid-level interterface I think. Another decision I thought about but don't understand the consequences of: is it better to filter the subnormals out, or to instead cast them into normals, like I'm doing with widths. (assuming it's not part of this interface)

@pschanely
Copy link
Contributor Author

Hmm, this looks sensible to me at an evening glance! Of what I see, I'm most suspicious of the offset logic; ideally we'd get to a mask-based rather than addition-based solution (see e.g. #2907, though I'd like to do as little as possible of that to get this working and merged).

Feel like some sort of bit mask should be possible; I can play with this idea this week.

@pschanely pschanely requested a review from DRMacIver as a code owner May 10, 2022 18:59
@pschanely
Copy link
Contributor Author

Hmm, this looks sensible to me at an evening glance! Of what I see, I'm most suspicious of the offset logic; ideally we'd get to a mask-based rather than addition-based solution (see e.g. #2907, though I'd like to do as little as possible of that to get this working and merged).

Feel like some sort of bit mask should be possible; I can play with this idea this week.

At first glance, it feels like bounded integer draws for the exponent (and for very small ranges, the mantissa) would be even more targeted than a bitmask, right?

Unrelated, I'm getting a test failure that I think suggests an independent problem; even on the main branch, the following assert happens, but I think shouldn't. Could you confirm and suggest a fix? The logic in _calc_p_continue is rather opaque.

$ python -c 'from hypothesis.internal.conjecture.utils import _calc_p_continue; _calc_p_continue(1e-5, 1906)'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "hypothesis-python/src/hypothesis/internal/conjecture/utils.py", line 468, in _calc_p_continue
    assert 0 < p_continue < 1
AssertionError

@Zac-HD
Copy link
Member

Zac-HD commented May 11, 2022

Bounded integer draws are just a bitmask-and-rejection-sampling: if you've got the right number of bits, then in expectation you succeed in less than two attempts 😁. Admittedly that goes down to a merely better-than-one-in-four chance for floats, assuming that we calculate the mantissa-mask based on the observed exponent, but still pretty good IMO and the simiplicity of atomic representation which works with our existing mutators is probably worth the small overhead.

As to the other, fixed in 224425e though I'm pretty sure it's truly unreachable via the public API.

@pschanely
Copy link
Contributor Author

Bounded integer draws are just a bitmask-and-rejection-sampling: if you've got the right number of bits, then in expectation you succeed in less than two attempts 😁. Admittedly that goes down to a merely better-than-one-in-four chance for floats, assuming that we calculate the mantissa-mask based on the observed exponent, but still pretty good IMO and the simiplicity of atomic representation which works with our existing mutators is probably worth the small overhead.

Ah! Got it. Take a look at what I've got now - I'm not sure I'm doing the "make your own luck" process right? Although I'm drawing bits, I'm using integer modulo instead of the mask still - seems behaviorally similar, but perhaps nice to avoid the secondary rejection stage.

As to the other, fixed in 224425e though I'm pretty sure it's truly unreachable via the public API.

Thank you!

If this looks good, I can add something for subnormal rejection ... and then we might be good?

@Zac-HD
Copy link
Member

Zac-HD commented May 12, 2022

Yeah, I think we're close! My only real concern at this point is performance, which is at least not catastrophically worse at the moment. Based on a code review I don't even have specific reason for concern beyond the test failure below, but it would be nice to check a profile of something float-intensive (unique arrays maybe?) before and after to check. Happy to do that over the weekend if necessary.

_______________ test_floats_order_worse_than_their_integral_part _______________
Traceback (most recent call last):
  File "tests/conjecture/test_float_encoding.py", line 101, in test_floats_order_worse_than_their_integral_part
    @given(st.integers(1, 2**53), st.floats(0, 1).filter(lambda x: x not in (0, 1)))
  File "hypothesis/core.py", line 1237, in wrapped_test
    raise the_error_hypothesis_found
  File "hypothesis/internal/healthcheck.py", line 27, in fail_health_check
    raise FailedHealthCheck(message, label)
hypothesis.errors.FailedHealthCheck: It looks like your strategy is filtering out a lot of data.
    Health check found 50 filtered examples but only 9 good ones.

@pschanely
Copy link
Contributor Author

I haven't been able to successfully diagnose these (seemingly sporadic) health check failures yet, but I presume I'm doing something that's not great ... somewhere. I'll continue to poke at it, but if you have suggestions on how you diagnose this kind of problem, I'm all ears.

FWIW, I also did a microbenchmark on just generating floats (without any sort of post-filter applied); it seems slower, but by a small amount in comparison to how long, for example, ConjectureData.draw_bits() takes.

@Zac-HD
Copy link
Member

Zac-HD commented May 13, 2022

Main suggestion is to set the --hypothesis-seed=0, which should at least mean you can get a deterministic repro. Cycle through seeds if that's what it takes...

@pschanely
Copy link
Contributor Author

Maybe working (partly) in the lex_to_float/float_to_lex space would help? I suspect though that we're just going to need some more bits to choose whether we're in the log-or-linear mode, which maybe could be determined based on the "spare" entropy defined by the disallowed floats somehow?

There's that high bit that's used to select the integer interpretation (heh, looking at it now, it seems like I severed that path with the clamper ... but we can fix that). We could expand that integer interpretation to allow for some number of decimal places. Maybe put the decimal digits up front, least-significant-first, so shrinking is still nice. If we wanted to get super fancy, we could vary the number of decimal digits by the size of the allowed range.

@Zac-HD
Copy link
Member

Zac-HD commented May 19, 2022

Just going with quick responses, because it's late but I'd rather give you something before the weekend:

  • Let's just set aside the distributional problem for now. It is important, but it's also a small and easily-isolated part of the problem; if everything else is lined up I can probably snipe it in half a day but not much less. Worst case, we spend some extra entropy to use a Sampler which can give us whatever distribution over exponents we want (which might actually be a good move 🤔, also works well with swarm testing...)
  • I am not crazy enough to try something with decimal digits. Integers and binary digits are plenty, and already supported by the format.
  • But regardless of that part, next steps are per the review above.

@pschanely
Copy link
Contributor Author

  • Let's just set aside the distributional problem for now. It is important, but it's also a small and easily-isolated part of the problem; if everything else is lined up I can probably snipe it in half a day but not much less. Worst case, we spend some extra entropy to use a Sampler which can give us whatever distribution over exponents we want (which might actually be a good move 🤔, also works well with swarm testing...)

Acknowledged. I've pushed the more straightforward changes at least.

I also briefly thought above that I did something bad to the integers representation with the clamper. Not true; we should be all good. (the clamper operates on the real float after un-lex-ing and is independent from that representation)

Copy link
Member

@Zac-HD Zac-HD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, next steps are "on the weekend I'll have time and brainspace to attack the exponent thing".

And I can't say this often or sincerely enough, thank you for your generous and skillful contributions to some of the trickiest parts of Hypothesis and the broader testing ecosystem 😍

hypothesis-python/RELEASE.rst Outdated Show resolved Hide resolved
hypothesis-python/tests/cover/test_float_utils.py Outdated Show resolved Hide resolved
Comment on lines 94 to +96
def test_minimal_non_boundary_float():
x = minimal(floats(min_value=1, max_value=9), lambda x: x > 2)
assert 2 < x < 3
assert x == 3 # (the smallest integer > 2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good, and I'll want to additionally tighten up the other tests to check we get the exact shrinking behavior we want. After we deal with the large-negative-exponent thing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did some tightening, in this file at least. Any other places?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is all that springs to mind, and it looks like you've tightened it all up 👍

Oh, how about adding a new test that floats between say (1.1, 1.6) shrinks to 1.5? Asymmetric interval going to nice values, rather than nice fractions of the allowed interval.

@Zac-HD
Copy link
Member

Zac-HD commented May 23, 2022

OK, reasoning this through:

  • If the initially-drawn float is permitted, we want to keep it.
    • Which means that make_float_clamper will be responsible for fixing the distribution
    • It doesn't need to be perfect; any reasonably-bug-revealing heuristic will do - the whole point is to make it possible to use fancier techniques like CrossHair and HypoFuzz
  • I think the best option is to "clamp" by choosing from a uniform random distribution (like the old FixedBoundedFloats() strategy).
    • This works out to a 50/50 split between uniform and log-uniform (i.e. unbiased bits) for floats(0, 1)
    • when more exponents are allowed, the uniform component is less important
    • when fewer are allowed (e.g. floats(0.25, 1)) then the uniform component is weighted more heavily but distorts the distribution less.
  • In order to balance many-possible-values with low-correlation-with-initial-draw-failing, I propose that we should just use all the mantissa bits to form a float in the range $[0, 1)$ and then use that as FixedBoundedFloats() did.
    • In cases where the allowed interval spans zero, let's keep the sign bit and treat our multiplier as "indexing" into the allowed interval-with-that-sign
    • Specifically, let's exclude zero from this, and only generate between the least nonzero float and the max value for positive (etc for negative); this makes handling allow_subnormal=False much easier

Thoughts?

(also, explicit reminder for the record: I have a policy that any contributor can ask me to take over a PR if they don't want to work on it anymore - I want contributing to be a good experience, and to get PRs even when the issue might turn out to be a tarpit!)

@pschanely
Copy link
Contributor Author

pschanely commented May 23, 2022

Thoughts?

Clever! Love it. And not a big change to make. I've just pushed something.

Let me know if I understood you correctly, and then I'll go back through the TODOs and whatnot.

@Zac-HD
Copy link
Member

Zac-HD commented May 23, 2022

Yep, that's exactly what I was thinking of (and the division-by-mask to implement it is remarkably elegant)

Copy link
Member

@Zac-HD Zac-HD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few tiny tweaks, and then we're ready to merge 🎉 🎉 🎉

hypothesis-python/tests/nocover/test_simple_numbers.py Outdated Show resolved Hide resolved
Comment on lines 94 to +96
def test_minimal_non_boundary_float():
x = minimal(floats(min_value=1, max_value=9), lambda x: x > 2)
assert 2 < x < 3
assert x == 3 # (the smallest integer > 2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is all that springs to mind, and it looks like you've tightened it all up 👍

Oh, how about adding a new test that floats between say (1.1, 1.6) shrinks to 1.5? Asymmetric interval going to nice values, rather than nice fractions of the allowed interval.

@pschanely
Copy link
Contributor Author

Not quite getting the result I expected with shrinking (1.1, 1.6) to 1.5. Investigation in progress ...

@pschanely
Copy link
Contributor Author

Not quite getting the result I expected with shrinking (1.1, 1.6) to 1.5. Investigation in progress ...

Well! I didn't figure it out. (1.1, 1.9) DOES get to 1.5, and I've added that test. But all is not well. (1.1, 1.6) yields 1.501953125. Here is my best attempt at explaining what happens, but I'm not confident about my understanding:

There is a one bit difference between 1.501953125 and 1.5; in the lex form, it's the last bit in the next-to-last byte. The shrinker successfully clears the earlier mantissa bytes. We do attempt to clear that bit, but at the same time, we roll over the final byte to 255. When bit-reversed, those are the high bits of the mantissa, and we run afoul of the clamper upper bound. And that's why increasing the upper bound to 1.9 makes it work.

I've noticed some float-specific shrinking logic (but have not yet spent more than a moment trying to grok it). Even when attempting to disable that logic, I'm seeing approximately the same behavior on this test, so I suspect it's not relevant here. But I don't fully trust myself on that assertion either.

Ehhhh. So, not sure exactly what to do next. Possibly it's not really a regression. At the same time, I might be making changes that make pre-existing suboptimal shrinking more likely. What do you think?

Copy link
Member

@Zac-HD Zac-HD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, we have a substantial improvement over the status quo already - in the worst cases, shrinking merely isn't noticably better. Since I think fixing this last case would be a substantial adventure into the shrinker, I'm inclined to merge this PR as-is; and I'll write up a tracking issue for the floats(1.1, 1.6) case tonight.

Thanks again for all your work on this!

@Zac-HD Zac-HD enabled auto-merge May 24, 2022 18:24
@Zac-HD Zac-HD merged commit ef5ec3c into HypothesisWorks:master May 24, 2022
@pschanely
Copy link
Contributor Author

Exciting! I know first hand that shepherding PRs is a ton of work too. Thank you for all your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Overhauling the st.floats() internals
2 participants