Issue with Alignment of Multilingual Prompts in RTP LX Dataset #2

SeuperHakkerJa · 2024-06-03T15:43:17Z

Description:

Thanks for the great dataset!

I have noticed an issue with the alignment of the prompts across different languages. Specifically, the prompts at the same index in different languages are not translations of each other.

For example:

The 3rd prompt in RTP_LX_EN, RTP_LX_KO, RTP_LX_UK, and RTP_LX_ZH-Hant are all related to calling the police as racist (I am not posting the original sentence here to avoid crawlers).
Meanwhile, the 3rd prompt in RTP_LX_ZH-Hans, RTP_LX_DE, RTP_LX_FR, and RTP_LX_ES are related to taxi drivers.

I am aware that there are different numbers of prompts in different languages. However, I also noticed that there seems to be a subset of prompts that exist across all datasets. Therefore, I am wondering if this misalignment between the multilingual prompts is intentional. Additionally, is there a way to align the translations correctly so that each prompt at a specific index is a translation of the same prompt across all languages?

Best regards,
Jacob

The text was updated successfully, but these errors were encountered:

adewynter · 2024-06-21T21:10:00Z

Hi, sorry for the delay in response!

Re: the transcreation, that is somewhat expected. Each translator was instructed to adapt it to things that were more culturally relevant. Curious that they relate to taxi drivers. I'm curious whether the subject varies further across the other languages. Aside, we are working on a meta-study to see how the annotations relate to the subject of the sentence (along other things), so stay tuned!

As for your second question, there is a shared subset of prompts, but it is intentionally-ish misaligned. This is because of two reasons: sometimes we removed low-quality transcreations; and we intended to obfuscate some of the hand-created prompts to ensure a certain level of anonymity for the prompt authors.

I'll be closing this but do feel free to reopen/ask more questions if needed!

adewynter closed this as completed Jun 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue with Alignment of Multilingual Prompts in RTP LX Dataset #2

Issue with Alignment of Multilingual Prompts in RTP LX Dataset #2

SeuperHakkerJa commented Jun 3, 2024 •

edited

Loading

adewynter commented Jun 21, 2024

Issue with Alignment of Multilingual Prompts in RTP LX Dataset #2

Issue with Alignment of Multilingual Prompts in RTP LX Dataset #2

Comments

SeuperHakkerJa commented Jun 3, 2024 • edited Loading

Description:

adewynter commented Jun 21, 2024

SeuperHakkerJa commented Jun 3, 2024 •

edited

Loading