-
-
Notifications
You must be signed in to change notification settings - Fork 298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What constitutes an acceptable keyword? #194
Comments
Hey sorry for the lack of response I was largely away last year. TBH I have not thought about this at length. but I agree with what you've written here. If pull requests were sent for these keywords, I'd accept them all.
I agree. I'd be happy to accept a PR for this if anyone's willing to send them. |
@jacobwhall I'm planning to fork this and start an emoji autocomplete project also. Have you settled on a fast way to search through the aliases? I was thinking about doing something like a filter, but not sure if there are faster options out there. UPDATE: I did some performance tests, and I think for best performance I'm going to flatten the arrays into strings -- finding partial text matches in strings is faster than doing the same operation on arrays. https://jsbench.me/zql58n0oew/1 When doing a partial match on every keystroke, every bit of performance counts ^^. |
@thdoan sounds like you've done as much as I have. I wrote an emoji picker in Python that you're welcome to check out. The search works surprisingly well! |
@jacobwhall cool, I'm experimenting with an emoji autocomplete by leveraging the browser's native datalist functionality. However, I've decided to start my emojis map from scratch based on https://emojipedia.org/ (all tedious manual work since they closed their API). We'll see how it goes. |
+1, having docs on this would be great. I'm working on omnidan/node-emoji#132 to bring emojilib/dist/emoji-en-US.json Lines 986 to 991 in e8e9a84
I wrote a quick script to find discrepencies: // npm i emojilib-2@npm:emojilib@2 emojilib-3@npm:emojilib@3
const { lib: emojisV2 } = await import("emojilib-2");
const { default: emojisV3 } = await import("emojilib-3", {
assert: { type: "json" },
});
const missing = [];
const missingIgnoringAliases = [];
for (const [nameV2, detailsV2] of Object.entries(emojisV2)) {
const detailsV3 = emojisV3[detailsV2.char];
if (detailsV3?.includes(nameV2)) {
continue;
}
const complaint = { nameV2, detailsV2, detailsV3 };
missing.push(complaint);
const primaryAlias = detailsV3?.[0];
if (
primaryAlias &&
!/^(?:flag|two|smiling_face_with)_|_face$/.test(primaryAlias)
) {
missingIgnoringAliases.push(complaint);
}
}
console.table({
"Missing in general": missing.length,
"Missing ignoring a few quick aliases": missingIgnoringAliases.length,
});
@muan is there a description anywhere of how #178's lists were generated? Or, if not, could you speak to how you generated it? |
I believe I had some hack-together local scripts so I don't recall the exact differences. But here's what might have happened: Previously this project was exclusively built for github shortcodes at our internal hackathon, and with v3 I decided to move away from that. so the primary key became their official unicode names, which would explains why IIRC, the official name of the emoji changes with each version sometimes too (gun -> water gun), which was why I made the character be the key now. I feel like I would/should have done the work to compare and keep the GitHub shortcodes but I guess I did not. So to add them all back, a name/alias comparison between GitHub's set and the unicode set could potentially do the trick. |
OK! Sorry for taking so long on this - I wanted to really think through the problem space. As in: what's a "keyword"? Using the 🛫 emoji as an example, I think there are really 2-3 use cases for emoji keywords:
Ideally I'd propose emojilib separate at least 🆔 identity from 🔗 relation keywords. Some users will want only identity, e.g. +1 to @muan's suggestion in #194 (comment) of a comparison. I'd say a programmatic approach would be the easiest & least controversy-risking approach for
As for setting up that programmatic approach... we can get halfway there. I made a standalone Looking at the data that's in
Full comparison on: https://github.com/JoshuaKGoldberg/repros/tree/emojilib-emojipedia-keywords-comparison. My next task will be trying to similarly source the 🔗 relation keywords programmatically. That way we can make a script that populates |
Update: I have a proposal for your review now @muan! 🙌 Preview the full proposal of changes here: This follows what I proposed in the last comment: that Using 🛫 as an example, here's what that would look like:
Full comparison and proposal tables on: https://github.com/JoshuaKGoldberg/repros/tree/emojilib-platforms-keywords-comparison. Unless directed otherwise, I'll send a big PR updating the keywords in this repo... soon. Hopefully later this month. Note that the following emojis have significantly fewer keywords in the proposed changes:
None of the platforms in |
Thank you for your work on this @JoshuaKGoldberg
I suggest that we integrate individual keyword contributions into this new workflow. I think it's worth retaining the keywords from this project for the example emojis you provided. Contributions to this project could continue to add common-sense keywords that may have been overlooked by unicode/emojipedia/etc. |
Makes sense! I sent #226 as a draft for reference that only augments, rather than removes. |
Is there any indication when #226 will be moved from draft/will be merged? Interested in seeing a resolution to this upstream lib omnidan/node-emoji#132. |
Any progress on this guys? Like the idea of having a strict workflow in here instead of random keyword proposals |
First of all, thank you for maintaining this repository!
I wrote a rudimentary emoji search program using your data, and noticed that, for example, "poop" does not match any of the keywords for 💩:
emojilib/dist/emoji-en-US.json
Lines 746 to 753 in f3169dc
There are a lot of other poop synonyms listed here, so I feel that "poop" would be an uncontroversial addition. But there are many synonyms for poop, and we might not want to include them all?
Another example I ran into was for 📱:
emojilib/dist/emoji-en-US.json
Lines 7259 to 7265 in f3169dc
The first phrase I'd say if you asked me to identify this emoji is "cell phone." However, none of the keywords for this emoji would match "cell." Would it be appropriate to add "cell," "cell_phone," or "cellular_phone?" Are non-official keywords that use underscores OK, or should substrings like "phone" be added as well as "mobile_phone?"
Finally, and I write this sincerely, I'd like to discuss 🍆:
emojilib/dist/emoji-en-US.json
Lines 4269 to 4275 in f3169dc
This emoji is often used to signify a penis. Would it be acceptable to add "dick" or "penis" to the list of associated keywords for this emoji? I think that doing so would better reflect common usage, but might stray too far from Unicode's "intended use" for the emoji (if that's a thing).
I suggest that a section be added to CONTRIBUTING.md or README.md that gives guidance to future contributors about questions like these.
…and that's how I posted a GitHub issue about poop, cell phones, and penises 🤪
The text was updated successfully, but these errors were encountered: