Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-Dummy variable codings in ContinuousEncoder #534

Open
ParadaCarleton opened this issue Oct 15, 2023 · 2 comments
Open

Non-Dummy variable codings in ContinuousEncoder #534

ParadaCarleton opened this issue Oct 15, 2023 · 2 comments
Labels
enhancement New feature or request Good GSoC project

Comments

@ParadaCarleton
Copy link

ParadaCarleton commented Oct 15, 2023

Right now, MLJModels.jl has ContinuousEncoder, which automatically transforms into one of two codings:

  1. Dummy variable coding (one-hot with last category dropped)
  2. Redundant variable coding (one-hot)

However, these aren't always the most useful codings for effective regularization, and there are many others in common use. For example, the most common way to encode ordinal variables is with sequential difference encoding; with this encoding, regularization pulls adjacent categories closer together, which improves model performance relative to either treating ordered variables as categorical (discarding ordering information) or treating them as continuous (using an equal-distance assumption that is often incorrect). Similarly, effect coding allows you to regularize categories towards the grand mean (rather than regularize every category towards 0, or regularize all categories towards one other category).

@ablaom
Copy link
Member

ablaom commented Oct 17, 2023

Sounds like a good idea to me. PR welcome 😉

@ablaom ablaom added Good GSoC project enhancement New feature or request labels Oct 17, 2023
@ablaom
Copy link
Member

ablaom commented Feb 12, 2024

@github-project-automation github-project-automation bot moved this to priority low / involved in General Aug 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Good GSoC project
Projects
Status: priority low / involved
Development

No branches or pull requests

2 participants