Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clustering (hierarchical vs. others): inconsistent handling of items that cannot be assigned to a cluster #6954

Open
wvdvegte opened this issue Dec 13, 2024 · 0 comments
Labels
bug report Bug is reported by user, not yet confirmed by the core team

Comments

@wvdvegte
Copy link

wvdvegte commented Dec 13, 2024

What's wrong?
For records that cannot be clustered, most clustering widgets produce a "?" in the Cluster column. Only Hierarchical Clustering assigns a cluster number to each record. However, when the dividing line is slid to the right, clusters of only one record may appear, which are in fact not clusters (or if they are, clusters of one item should also get their own cluster number from the other clustering widgets).
This is problematic when comparing different clustering approaches applied to the same dataset. Silhouette Plot, for instance gives a warning if there are "?"s in the Cluster column but treats one-item clusters from hierarchical clustering as true clusters.
I'd propose to let Hierarchical Clustering produce a "?" in the Cluster column if a "cluster" has only one record.

@wvdvegte wvdvegte added the bug report Bug is reported by user, not yet confirmed by the core team label Dec 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug report Bug is reported by user, not yet confirmed by the core team
Projects
None yet
Development

No branches or pull requests

1 participant