Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add many missing emoji aliases #41

Merged
merged 2 commits into from
Dec 7, 2019

Conversation

gencer
Copy link

@gencer gencer commented Nov 5, 2019

I've added many missing aliases to support rendering more emojis especially via emojione. But it contains from several sources. I did this via script so no duplicates (as for aliases).

@gencer gencer changed the title add all missing emojione aliases add many missing emoji aliases Nov 5, 2019
@gencer gencer force-pushed the master branch 3 times, most recently from ed183f4 to e8fa3b4 Compare November 5, 2019 00:36
@enzoferey
Copy link
Collaborator

Hi @gencer ! Thanks for the PR 🔥

Looks like most of them are duplicates of existing aliases with a different name (which is fine of course). It's pretty hard for me to identify in such a big list if there is any new emoji (which is what adds the most value). Are there any ?

Also, could you please tell us the source of data you used and include the script you used to retrieve the data ?

We currently have a script that already automatise this process for us, let's see how we can improve it.

@gencer
Copy link
Author

gencer commented Nov 5, 2019

Hi @gencer ! Thanks for the PR 🔥

Hi @enzoferey! Glad you liked the PR.

Looks like most of them are duplicates of existing aliases with a different name (which is fine of course). It's pretty hard for me to identify in such a big list if there is any new emoji (which is what adds the most value). Are there any ?

To be honest, I am not specifically sure. But i can filter and separately append new emojis at last and separate with comment if you like.

Also, could you please tell us the source of data you used and include the script you used to retrieve the data ?
We currently have a script that already automatise this process for us, let's see how we can improve it.

I never saw your script before until you mentioned here, actually. However, I made a silly dummy ruby script that loads 3 different emoji JSON list and then iterate all and unique by name at last. EmojiOne also uses description as an alias. They replace spaces with underscore and exclude anything that contains + and . signs. The source of my emoji aliases are located here and here. The third source is your aliases.js. I leave aliases.js content at top then append all missing aliases to the end of the object. You may notice that one of the emoji source is against v4.0.0. This is interesting but EmojiOne v2 renders aliases from this source too. It just works. Especially derivatives from description.

Please note that. I use draft-js-emoji-plugin which uses EmojiOne v2. When I insert emojis to editor and then render in body, i got many of them not rendered due to unknown emoji aliases. With my additions, all of my aliases properly rendered.

Here is ruby script responsible of alias mapping. (As i said its very silly and dummy script with no consciousness.)

In fact not the emojione responsible for aliases, but unicode representatives are responsible and those aliases exists.

@enzoferey
Copy link
Collaborator

Hello again @gencer ! Sorry for the delay.

Thanks for the sources. Is there any reason why you use the 2.7.7 and 4.0.0 version of the same thing ? Shouldn't the emoji in the 4.0.0 contain the ones of the 2.7.7 ? Also, there is a 5.1.1 version available (which I assume contains the latest ones?).

We would love to solve this emojis source issues once for all and be able to update them automatically whenever new ones come out, but we need to be very picky about the sources in order to ensure we won't be missing any of them and we will be using the right aliases.

I keep your sources as a reference and I hope to be able to spend some time in the next few days doing some research about the topic. Any help would be greatly appreciated 👍

@enzoferey
Copy link
Collaborator

enzoferey commented Nov 11, 2019

Hi there !

Summary of my research:

I found out about https://github.com/muan/unicode-emoji-json. It has been just started by the creator of https://github.com/muan/emojilib as a way to scrap the official Unicode documents. It looks pretty solid.

The issue is in term of the aliases. Aliases has been made up by the community over time and nowadays there are many inconsistencies between emoji libraries (for example muan/emojilib#135). As pointed out here, it looks like emojilib will start using the sequence instead of made up names. It makes sense on one hand, but on the other it will make harder for people to find the emojis they are looking for. Think about the aliases being used in Slack, Github, etc. I'm constantly switching between apps and I always struggle to find my emojis because aliases are different. So I started searching for emoji aliases datasets...

It looks like https://github.com/github/gemoji/blob/master/db/emoji.json is the best shot out there (seems like @gencer took some inspiration from them for the script). They are parsing the official docs as unicode-emoji-json does and on top they have built over time a collection of aliases for each of them (starting from an initial value of the sequence).

What I'm suggesting to do is:

  1. Use https://github.com/muan/unicode-emoji-json/blob/master/data-by-emoji.json to get all emojis and their sequence:
// data/aliases/baseAliases.json
{
  "😀": ["grinning_face"],
  ...
}
  1. Use https://github.com/github/gemoji/blob/master/db/emoji.json to grab their aliases as an initial reference to extend previous dataset:
// data/aliases/gemojiAliases.json
{
  "😀": ["grinning_face", "smile"],
  ...
}
  1. Create a customAliases.json file in which people will be able to contribute:
// data/aliases/customAliases.json
{
  "😀": ["custom_alias_1", "custom_alias_2"],
  ...
}
  1. Build our final aliases.js file from merging baseAliases.json + gemojiAliases.json + customAliases.json arrays via a script that warns you if some alias it duplicated:
// data/aliases.js
module.exports = {
  "grinning_face": "😀",
  "smile": "😀",
  "custom_alias_1": "😀",
  "custom_alias_2": "😀",
  ...

This setup allows us to:

  • Be able to always have latest emojis thanks to unicode-emoji-json.
  • Be able to profit from aliases datasets like gemoji or any other library out there by just parsing their aliases to our format.
  • Be able to accept easily contributions from people that want to enrich our aliases collection.

Waiting for your feedback guys. cc @tommoor.

@gencer
Copy link
Author

gencer commented Nov 12, 2019

LGTM. As you said, with this scheme we will have latest changes available always.

@abnersajr
Copy link

abnersajr commented Dec 3, 2019

@enzoferey reading through the conversation I think it's a good idea have this flow update.
Why not open another PR for this update and let this one be merged for now?

What you think @gencer and @tommoor?

@enzoferey
Copy link
Collaborator

Hi @abnersajr !

Creating the other PR would take a just a couple of hours, thus merging this PR wouldn’t bring much value in my opinion. But I’m not against 👍🏻

The problem is that @tommoor is not coming around here very frequently and I don’t have rights to merge pull requests and even less publish new versions of the library. That’s why I ask about specs before opening the PR. I don’t want to spend a couple of hours for not getting anything merged in months.

@tommoor
Copy link
Owner

tommoor commented Dec 5, 2019

@enzoferey yep – I think the idea is great here. Unfortunately this repo is not at the top of all of my OSS repos in terms of priority so it doesn't get as much attention. As such, I've made you a collaborator @enzoferey so you can push this forward as you see fit.

@enzoferey
Copy link
Collaborator

Thanks for the trust @tommoor.

I will work along the weekend to close this PR @gencer @abnersajr 👍

@enzoferey enzoferey changed the base branch from master to all-emojis-dataset-pipeline December 7, 2019 11:21
@enzoferey
Copy link
Collaborator

Merging this into the branch where I will implement the pipeline defined at #41 (comment) in order to start with the maximum amount of aliases in our customAliases.json file.

Thanks @gencer 👍

@enzoferey enzoferey merged commit ad78b98 into tommoor:all-emojis-dataset-pipeline Dec 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants