-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: tag clustering using ML #673
Conversation
tests appear to be failing, possibly due to some missing files / paths?
|
I will look into this. I ran the tests on my environment and it passed so I think you might be right about the missing files. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good work @hwelsters ! Included a few comments inline after a preliminary review of the code. I'll add more comments later after testing the actual functionality
Attempts to close comses/planning#125
Squashed commits and solved merge conflicts.
Summary
Perform tag clustering and gazetteering with dedupe.
Features
Tag Clustering is needed for creating the initial canonical list.
Curator commands
Created four new commands, one for clustering tags, the other for gazetteering / canonicalization.
1
curator_cluster_tags
This creates TagCluster objects. These can then be edited by going to curator_edit_clusters
2
curator_edit_clusters
This command lets the user edit clusters and then save the mappings to the database.
While modifying clusters, there are four options.
(c)hange canonical tag name
- Lets you change the name of the canonical tag(a)dd tags
- Lets you add tags to the cluster(r)emove tags
- Lets you remove tags from the cluster(s)ave
- Saves the cluster to the database.(f)inish
- This does not save the cluster. It just means you are done with changing it and you're moving on. I decided not to autosave since there are certain cases where the user might want to just get rid of the cluster instead of saving it.3
curator_map_tags
This command attempts to map a tag to a canonical tag if it currently isn't already mapped.
4
curator_modify_cannon
This command is used if the user would like to modify the canonical list.
Tests
Wrote tests using Django tests