-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] Support for phrases #21
Comments
This is an interesting feature. I'm not sure how to address this lemma problem though. Simply lemmatizing each of the words would not really work, since the example you give will simply become |
It would search the surface form first, |
You managed to build dictionaries for Yomichan?? How did you do that? Is there an article somewhere? |
I'm really excited too. This will most likely be the first European language Yomichan dictionary ever. I had to make a custom version of Yomichan that parses words based on space separators rather than character by character. There was also the issue where Yomichan parsed words in each element after the one I hovered over even if the element didn't have the same parent. Making the dictionary was the easiest part. I have Wiktionary JSONs for multiple European languages but Russian is the most fleshed out one. Previously, I was basically reinventing the wheel and remaking an Electron version of Yomichan Search for Russian, but it looks like I can drop that project and work on this instead. There are still some features I want to add, and it needs to undergo testing, but here's what I have so far: I connected a Forvo audio server, and that's working too. |
Great work!
|
All my work is on my computer, but I'm planning on putting it on a couple repositories when I'm satisfied with this Yomichan project. The code for generating Wiktionary JSONs will also be available. I don't plan on contributing to vocabsieve since this project should cover all my needs. Inflections are easily handled since form data is provided in the JSONs. I get the JSONs from here (raw Wiktionary data), then I run a script that pulls out all the useful information into a new JSON file. Then I run another script to create dictionary entries. It's kind of complicated, so I'll have to clean that up a bit. Here's what the data looks like after all that: From there, I run another script on that JSON to create a compatible Yomichan dictionary. Non-lemma words get their own definition in Yomichan. If you turn on Unfortunately |
Here's the GitHub repository for extracting relevant data from the Kaikki Wiktionary rip. Let me know if you have any problems with it. |
In the example sentence:
Как не сойти с ума, когда вокруг одна лишь тьма?
, the sentence contains the phraseсойти с ума
. There is a Wiktionary definition available for this phrase.However, when I double click
сойти
, I only get the definition for the one word, not the phrase. Ideally, the word search feature would be able to look ahead until it reached a punctuation character, and then looked up all the words leading down to the word you first selected.Here's an example how that would work:
Sentence:
чтобы хотя бы так поддержать твои старания
хотя
.хотя бы так поддержать твои старания
to see if it's a lemma or non lemma Russian word.хотя бы так поддержать твои
хотя бы
The approach isn't viable when using an online dictionary. If this feature was added, you would need to make it a requirement that to use this feature, an offline dictionary is needed.
The text was updated successfully, but these errors were encountered: