Better support string values for scores #785

dragonstyle · 2024-10-31T15:12:58Z

If a user produces a score whose value is a string, when that value is ‘reduced’ using the default mean reducer, it is coerced to a float. For strings thing means when the Score arrives at the custom metric, it will carry the reduced value which has been coerced to a float.

This fix is minimal - it implements support for string values in the mean reducer, providing the most common string value (or the first string value if non are most common).

Fixes #775

This PR contains:

What is the current behavior? (You can also link to an open issue here)

Scorers that return string values run through the reduction process and the string values are reduced to a float by the mean reducer.

What is the new behavior?

The mean reducer will reduce strings to the most common value, meaning that the string value is preserved across the mean reducer.

Does this PR introduce a breaking change? (What changes might users need to make in their application due to this PR?)

No

Other information:

If a user produces a score whose value is a string, when that value is ‘reduced’ using the default mean reducer, it is coerced to a float. For strings thing means when the Score arrives at the custom metric, it will carry the reduced value which has been coerced to a float. This fix is minimal - it implements support for string values in the mean reducer, providing the most common string value (or the first string value if non are most common). Fixes #775

This reverts commit 79f1d7e. # Conflicts: # CHANGELOG.md

* Revert "Better support string values for scores (#785)" This reverts commit 79f1d7e. # Conflicts: # CHANGELOG.md * Add test

* Revert "Better support string values for scores (#785)" This reverts commit 79f1d7e. # Conflicts: # CHANGELOG.md * Add test * Add test * Don’t run reducer against single epoch samples At the lowest level (just before reducing), determine whether there is more than one sample for the sample_id and do not actually reduce the value if there is only one. This prevents side effects of reducers from happening to single samples (side effects include: mutating the scores - strings become floats, for example, due to the mean reducer being applied; mutating the other score fields like explanation and metadata to reduced values; etc). This overall seems good (no side effects when scoring single epoch evaluations), but could introduce bugs if anyone was relying upon those mutations (for example if they wrote string score values, but were downstream expecting those to be forced to floats by the reducer). --------- Co-authored-by: jjallaire-aisi <[email protected]>

dragonstyle requested a review from jjallaire October 31, 2024 15:13

Merge branch 'main' into bugfix/775

9df6e65

jjallaire approved these changes Oct 31, 2024

View reviewed changes

jjallaire merged commit 79f1d7e into main Oct 31, 2024
9 checks passed

jjallaire deleted the bugfix/775 branch October 31, 2024 17:11

dragonstyle added a commit that referenced this pull request Nov 4, 2024

Revert "Better support string values for scores (#785)"

3972d59

This reverts commit 79f1d7e. # Conflicts: # CHANGELOG.md

jjallaire-aisi pushed a commit that referenced this pull request Nov 4, 2024

Revert "Better support string values for scores (#785)" (#797)

6b2799b

* Revert "Better support string values for scores (#785)" This reverts commit 79f1d7e. # Conflicts: # CHANGELOG.md * Add test

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better support string values for scores #785

Better support string values for scores #785

dragonstyle commented Oct 31, 2024

Better support string values for scores #785

Better support string values for scores #785

Conversation

dragonstyle commented Oct 31, 2024

This PR contains:

What is the current behavior? (You can also link to an open issue here)

What is the new behavior?

Does this PR introduce a breaking change? (What changes might users need to make in their application due to this PR?)

Other information: