NTCIR MIaS Search Deployment – Evaluates NTCIR MIaS Search on NTCIR Math datasets
NTCIR MIaS Search Deployment is a GNU make makefile that automates the deployment of WebMIaS, and the configuration and execution of the various MIR packages, resulting in an evaluation of the NTCIR MIaS Search system on the NTCIR Math datasets.
Before running make, you will need to adjust the definitions of the following
variables in the file definitions.mk
to match your environment:
CPU_NUMBER
– The number of threads the tools will useTOPICS_NTCIR10_FS
– The path to the NTCIR-10 Math topics for the math retrieval formula search subtaskTOPICS_NTCIR10_FT
– The path to the NTCIR-10 Math topics for the math retrieval full-text search subtaskTOPICS_NTCIR11
– The path to the NTCIR-11 Math-2 topics for the main taskTOPICS_NTCIR12_QUERIES
– The path to the NTCIR-12 MathIR topics for the arXiv main taskTOPICS_NTCIR12_SIMTO
– The path to the NTCIR-12 MathIR topics for the optional arXiv similarity taskJUDGEMENTS_NTCIR10_FS
– The path to the NTCIR-10 Math judgements for the math retrieval formula search subtaskJUDGEMENTS_NTCIR10_FT
– The path to the NTCIR-10 Math judgements for the math retrieval full-text search subtaskJUDGEMENTS_NTCIR10_FS
– The path where the converted NTCIR-10 Math judgements for the math retrieval formula search subtask will be placed by the NTCIR-10 Math Converter packageJUDGEMENTS_NTCIR10_FS
– The path where the converted NTCIR-10 Math judgements for the math retrieval full-text search subtask will be placed by the NTCIR-10 Math Converter packageJUDGEMENTS_NTCIR11
– The path to the NTCIR-11 Math-2 judgements for the main taskJUDGEMENTS_NTCIR12_QUERIES
– The path to the NTCIR-12 MathIR judgements for the arXiv main taskJUDGEMENTS_NTCIR12_SIMTO
– The path to the NTCIR-12 MathIR judgements for the optional arXiv similarity taskDATASET_NTCIR10
– The path to the NTCIR-10 Math datasetDATASET_NTCIR10_CONVERTED
– The path where the converted NTCIR-10 Math dataset will be placed by the NTCIR-10 Math Converter packageDATASET_NTCIR11_12
– The path to the NTCIR-11 Math-2, and NTCIR-12 MathIR dataset.
The following commands removing any results of the previous run, and builds all targets:
$ make clean all
Make will be executed with maximum verbosity and a file named Makefile.log
is
created in the current working directory, and contains all output along with
time information. The evaluation results will be stored in a directory named
according to the RESULTS_NTCIR11
variable defined in definitions.mk
.
Note that although the clean
pseudotarget removes all results, you will need
to manually shut down any running instances of Apache Tomcat that were started
as a result of the previous runs.
Executing the make plot
command produces the following dependency graph of
the individual targets in the makefile:
Disregarding the datasets that you need to download beforehand, make will produce about 157G of data, out of which 93G will be MIaS indexes, 51G the converted NTCIR-10 Math dataset, and 500M the final results.
The following table shows how long it takes on average to construct the individual targets with 448G of RAM, and eight Intel Xeon™ X7560 2.26 GHz CPUs. Only targets that take longer than a minute are listed:
Target name | Wall clock time | Number of testing runs |
---|---|---|
$(INDEX_NTCIR11_12) | 17h 53m 34s | 1 |
$(INDEX_NTCIR10) | 9h 2m 56s | 2 |
$(NTCIR_MATH_DENSITY_NTCIR11) | 1h 5m 30s | 4 |
$(NTCIR_MATH_DENSITY_ALL) | 52m 21s | 3 |
$(NTCIR_MATH_DENSITY_ALL_WITHOUT_NTCIR10) | 41m 50s | 3 |
$(DATASET_NTCIR10_CONVERTED) | 23m 26s | 1 |
$(RESULTS_NTCIR11) | 2m 4s | 1 |