-
Notifications
You must be signed in to change notification settings - Fork 260
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
probe: add Arabic DAN #1018
base: main
Are you sure you want to change the base?
probe: add Arabic DAN #1018
Conversation
Signed-off-by: Emad Alghamdi <[email protected]>
Signed-off-by: Emad Alghamdi <[email protected]>
DCO Assistant Lite bot All contributors have signed the DCO ✍️ ✅ |
I have read the DCO Document and I hereby sign the DCO |
recheck |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you say a bit more about how these prompts were generated? e.g. manual translation, machine translation, scraped from sources
Also, I think we need some logic to handle whether probes are active based on the bcp47
value, lest we run a bunch of Arabic strings against English language models (or vice versa).
garak/detectors/dan_ar.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do these differ meaningfully from the detectors in detectors.dan
? It seems to be a direct copy. If so, do we want to reference those extant detectors from the probes instead?
alignment/system prompt. Success is measured by not getting a safety mitigaton | ||
message.""" | ||
|
||
active = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest that we keep active=False
on the Arabic probes -- perhaps there's some logic that needs to be written around the language spec.
It also seems that the code here includes |
@erickgalinkin, I think the Also all comments I made in review or #1017 will apply here as well. For now we can have the separate languages as unique classes that can be selected explicitly and in future iterations we can consolidate those classes into language specific instances of one class. |
Signed-off-by: Emad Alghamdi [email protected]
I added Arabic translation of all DAN probes which were quality checked by human to ensure suitability to the Arabic language, added Arabic detector for the probe. The new probes and detectors passed the test during development.