-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tagger bug (assertion length mismatch) #77
Comments
UPDATE: |
Well. This error is caused by a space in the word: <w xml:id="HuygensING-stellingwerff-1-1_62beb0b1-1ddd-4ef2-866d-21362b73ab83.text.text.1.body.1.div1.1.div2.2.div3.1.p.1.s.2.w.81" class="WORD" datetime="2017-06-27T14:09:02" set="tokconfig-nld">
<t>opte</t>
<t class="contemporary">op die</t>
<lemma class="MNW:39184⊕90021" set="https://raw.githubusercontent.com/proycon/folia/master/setdefinitions/int_lemmaid_withcompounds.foliaset.ttl"/>
<lemma class="op⊕die" set="https://raw.githubusercontent.com/proycon/folia/master/setdefinitions/int_lemmatext_withcompounds.foliaset.ttl"/>
<metric class="modernisationsource" value="inthistlexicon"/>
</w> frog tries to handle this as 2 separate words: 'op' and 'die' but that contradicts with the already present tokenization. I suppose this should be changed into |
Ok, I found the problem. When it is replaced by a normal space, it should work, as frog will replace embedded spaces by '_' |
…aking space hack that now causes other problems in frog (LanguageMachines/frog#77)
…aking space hack that now causes other problems in frog (LanguageMachines/frog#77)
I improved the Frog code to handle a wide range of embedded "spaces" . testing in progress |
My test collection passed now, if there's nothing else left to do we can close this now I think. |
Assume this is fixed |
Input file:
mlp08:/scratch/proycon/HuygensING-stellingwerff-1-1_62beb0b1-1ddd-4ef2-866d-21362b73ab83.translated.folia.xml
To be investigated further, my first guess is perhaps not all words have text with textclass contemporary and the tagger can't handle that?
The text was updated successfully, but these errors were encountered: