Evaluating GPT-4 and ChatGPT on Japanese Medical Licensing Examinations

Introduction

we evaluate LLMs (GPT-3 and 4 and ChatGPT) on Japanese medical lincensing examinations from the past five years (2018-2022) and release the data as the IGAKU QA (医学 QA) benchmark

Benchmark Collection

We collect the exam problems and their answers in the past five years (from 2018 through 2022) from the official website of the Ministry of Health, Labour and Welfare in Japan. Notice that we do not rely on any translation of sources from other languages (e.g., English) or countries, and the benchmark comes solely from resources that are originally written in Japanese. See our paper for more detail.

Baselines

See our scripts that we use for the experiments in our paper. Note that you need an OpenAI API key to run these baselines.

Citations

IgakuQA and our evaluations on Japanese medical licensing examinations

@misc{jpn-med-exam_gpt4,
  author    = {Jungo Kasai and Yuhei Kasai and Keisuke Sakaguchi and Yutaro Yamada and Dragomir Radev},
  title     = {Evaluating {GPT}-4 and {ChatGPT} on {J}apanese Medical Licensing Examinations},
  year      = {2023},
  url       = {},
}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
baseline_results		baseline_results
data		data
scripts		scripts
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Evaluating GPT-4 and ChatGPT on Japanese Medical Licensing Examinations

Introduction

Benchmark Collection

Baselines

Citations

IgakuQA and our evaluations on Japanese medical licensing examinations

About

Releases

Packages

Languages

jungokasai/IgakuQA

Folders and files

Latest commit

History

Repository files navigation

Evaluating GPT-4 and ChatGPT on Japanese Medical Licensing Examinations

Introduction

Benchmark Collection

Baselines

Citations

IgakuQA and our evaluations on Japanese medical licensing examinations

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages