Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better support string values for scores #785

Merged
merged 2 commits into from
Oct 31, 2024
Merged

Better support string values for scores #785

merged 2 commits into from
Oct 31, 2024

Conversation

dragonstyle
Copy link
Collaborator

If a user produces a score whose value is a string, when that value is ‘reduced’ using the default mean reducer, it is coerced to a float. For strings thing means when the Score arrives at the custom metric, it will carry the reduced value which has been coerced to a float.

This fix is minimal - it implements support for string values in the mean reducer, providing the most common string value (or the first string value if non are most common).

Fixes #775

This PR contains:

  • New features
  • Changes to dev-tools e.g. CI config / github tooling
  • Docs
  • Bug fixes
  • Code refactor

What is the current behavior? (You can also link to an open issue here)

Scorers that return string values run through the reduction process and the string values are reduced to a float by the mean reducer.

What is the new behavior?

The mean reducer will reduce strings to the most common value, meaning that the string value is preserved across the mean reducer.

Does this PR introduce a breaking change? (What changes might users need to make in their application due to this PR?)

No

Other information:

If a user produces a score whose value is a string, when that value is ‘reduced’ using the default mean reducer, it is coerced to a float. For strings thing means when the Score arrives at the custom metric, it will carry the reduced value which has been coerced to a float.

This fix is minimal - it implements support for string values in the mean reducer, providing the most common string value (or the first string value if non are most common).

Fixes #775
@dragonstyle dragonstyle requested a review from jjallaire October 31, 2024 15:13
@jjallaire jjallaire merged commit 79f1d7e into main Oct 31, 2024
9 checks passed
@jjallaire jjallaire deleted the bugfix/775 branch October 31, 2024 17:11
dragonstyle added a commit that referenced this pull request Nov 4, 2024
This reverts commit 79f1d7e.

# Conflicts:
#	CHANGELOG.md
jjallaire-aisi pushed a commit that referenced this pull request Nov 4, 2024
* Revert "Better support string values for scores (#785)"

This reverts commit 79f1d7e.

# Conflicts:
#	CHANGELOG.md

* Add test
jjallaire-aisi added a commit that referenced this pull request Nov 4, 2024
* Revert "Better support string values for scores (#785)"

This reverts commit 79f1d7e.

# Conflicts:
#	CHANGELOG.md

* Add test

* Add test

* Don’t run reducer against single epoch samples

At the lowest level (just before reducing), determine whether there is more than one sample for the sample_id and do not actually reduce the value if there is only one.

This prevents side effects of reducers from happening to single samples (side effects include: mutating the scores - strings become floats, for example, due to the mean reducer being applied; mutating the other score fields like explanation and metadata to reduced values; etc).

This overall seems good (no side effects when scoring single epoch evaluations), but could introduce bugs if anyone was relying upon those mutations (for example if they wrote string score values, but were downstream expecting those to be forced to floats by the reducer).

---------

Co-authored-by: jjallaire-aisi <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Question] Custom metric type
2 participants