Use more platform-independent random test ordering #39441
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
SUMMARY: Infrastructure "Make test failures from other platforms more easily reproducible"
Purpose of change
Currently we have difficulty reproducing test failures across platforms. One reason is that we are running the tests in declaration ordr, which depends on the build system & linker. We could use lexicographic order, but even better would be random order.
To that end, it would be helpful if random order (for the same seed) was the same across platforms. This is an attempt to achieve that.
Furthermore, with this change, when a subset of the tests are run, they run in the same order as they would have when more (or all) tests are run. This makes inter-test dependency bugs easier to track down by finding the smallest set of tests which reproduces them.
Describe the solution
Rather than randomly shuffling the tests, we now sort them be an integer value associated with each test. That value is derived from the random seed and test name in a deterministic manner.
Describe alternatives you've considered
std::uniform_int_distribution
might also introduce platform-dependence, so this might need further refinement. Considered fixing that now, but decided to wait and see if it's a real problem.Testing
Ran various subsets of tests with different seeds and observed the above properties.
Additional context
I've opened a similar PR on Catch2 directly, but I wanted to backport the change here because @wapcaplet has been working on resolving failures related to randomly ordered tests, and this should help.