You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have noticed an issue with the alignment of the prompts across different languages. Specifically, the prompts at the same index in different languages are not translations of each other.
For example:
The 3rd prompt in RTP_LX_EN, RTP_LX_KO, RTP_LX_UK, and RTP_LX_ZH-Hant are all related to calling the police as racist (I am not posting the original sentence here to avoid crawlers).
Meanwhile, the 3rd prompt in RTP_LX_ZH-Hans, RTP_LX_DE, RTP_LX_FR, and RTP_LX_ES are related to taxi drivers.
I am aware that there are different numbers of prompts in different languages. However, I also noticed that there seems to be a subset of prompts that exist across all datasets. Therefore, I am wondering if this misalignment between the multilingual prompts is intentional. Additionally, is there a way to align the translations correctly so that each prompt at a specific index is a translation of the same prompt across all languages?
Best regards,
Jacob
The text was updated successfully, but these errors were encountered:
Re: the transcreation, that is somewhat expected. Each translator was instructed to adapt it to things that were more culturally relevant. Curious that they relate to taxi drivers. I'm curious whether the subject varies further across the other languages. Aside, we are working on a meta-study to see how the annotations relate to the subject of the sentence (along other things), so stay tuned!
As for your second question, there is a shared subset of prompts, but it is intentionally-ish misaligned. This is because of two reasons: sometimes we removed low-quality transcreations; and we intended to obfuscate some of the hand-created prompts to ensure a certain level of anonymity for the prompt authors.
I'll be closing this but do feel free to reopen/ask more questions if needed!
Description:
Thanks for the great dataset!
I have noticed an issue with the alignment of the prompts across different languages. Specifically, the prompts at the same index in different languages are not translations of each other.
For example:
RTP_LX_EN
,RTP_LX_KO
,RTP_LX_UK
, andRTP_LX_ZH-Hant
are all related to calling the police as racist (I am not posting the original sentence here to avoid crawlers).RTP_LX_ZH-Hans
,RTP_LX_DE
,RTP_LX_FR
, andRTP_LX_ES
are related to taxi drivers.I am aware that there are different numbers of prompts in different languages. However, I also noticed that there seems to be a subset of prompts that exist across all datasets. Therefore, I am wondering if this misalignment between the multilingual prompts is intentional. Additionally, is there a way to align the translations correctly so that each prompt at a specific index is a translation of the same prompt across all languages?
Best regards,
Jacob
The text was updated successfully, but these errors were encountered: