Choice of Plausible Alternatives (COPA) is a dataset for open-domain commonsense causal reasoning. This dataset (Japanese COPA) provides Japanese translations of all sentences (premise, answer1, and answer2) in the original English dataset.
Each line of the dataset presents a question (premise, answer1, answer2, etc) in JSON Lines format.
Name | Description |
---|---|
id | Question id |
premise | Premise for the question |
asks_for | Type of the answer: 原因 (reason) or 結果 (result) |
correct_answer | Correct answer: 1 or 2 |
answer1 | Answer 1 |
answer2 | Answer 2 |
{
"id": 1,
"premise": "草の上に私の影ができた。",
"asks_for": "原因",
"correct_answer": 1,
"answer1": "太陽が昇っていた。",
"answer2": "草が刈られていた。"
}
Questions with IDs ranging from 1 to 500 provide the development set, and those from 501 to 1000 provide the test set.
Some premises and answers have Japanese pronouns such as 彼 (he, him, his), 彼女 (she, her), それ (it), and 彼/彼女ら (they, their, them). Anaphoras of these pronouns were resolved and described in the dataset.
Format: [pronoun](antecedent)
Example (id: 3)
- Premise: 女性たちは会ってコーヒーを飲みに行った。
- Answer(原因): [彼女ら](女性たち)は互いの近況を語り合いたかった。
Example (id: 14)
- Premise: 犯罪者が仮釈放の条件に違反した。
- Answer(結果): [彼女](犯罪者)は刑務所に送り返された。
We can replace pronouns with antecedents by running the Python code:
import re
p = re.compile(r'\[([^]]*)\]\(([^)]*)\)')
input_text = '[彼女ら](女性たち)は互いの近況を語り合いたかった。'
text_with_pronoun = p.sub(r'\1', input_text)
text_with_antecedent = p.sub(r'\2', input_text)
print(text_with_pronoun) # 彼女らは互いの近況を語り合いたかった。
print(text_with_antecedent) # 女性たちは互いの近況を語り合いたかった。
BSD 2-Clause License