Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supports for single quotes in personal title #54

Merged
merged 1 commit into from
Jul 1, 2019

Conversation

Joxit
Copy link
Member

@Joxit Joxit commented Jun 26, 2019

Background

In the French grammar, there are 8 article le, la, les, un, une, du, de, la. When an article like le or de is used with a noun which starts with a vowel, the article's vowel is replaced by a simple quote. For example, we don't write Le Amiral but L'Amiral.

So what ?

This PR fixes these edge cases which are common.
I think this should be spread in some other classifiers (such as GivenNameClassifier)

Copy link
Member

@missinglink missinglink left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple questions:

  • Is the single-quote always in the second position or is that not always true?
  • Is it possible to generate these synonyms at index-time rather than query-time? What difference would that make to performance and memory usage?

Note: you can use the 'backtick' in JS to avoid having to escape single quotes

@Joxit
Copy link
Member Author

Joxit commented Jun 27, 2019

  • Is the single-quote always in the second position or is that not always true?

With articles, the single-quote is always in the second position.

  • Is it possible to generate these synonyms at index-time rather than query-time? What difference would that make to performance and memory usage?

Oh! Yes, it's totally possible to generate synonyms at index time. For French personal titles from libpostal, there is only amiral which need to be generated => small CPU and memory footprint. That means I can add this directly here : resources/pelias/dictionaries/libpostal/fr/. In addition to this one, I could add missing titles like adjudant/l'adjudant, empereur/l'empereur and inspecteur/l'inspecteur.

This will also simplify this code 👍

Thanks 😄

@Joxit Joxit force-pushed the joxit/fra-single-quote-title branch from 684866d to 9495ccd Compare June 27, 2019 09:52
@Joxit Joxit merged commit cd87ea4 into master Jul 1, 2019
@Joxit Joxit deleted the joxit/fra-single-quote-title branch July 1, 2019 14:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants