Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Init an evaluation with a predefined global_uid_map and global_iid_map #581

Open
lthoang opened this issue Jan 9, 2024 · 3 comments
Assignees

Comments

@lthoang
Copy link
Member

lthoang commented Jan 9, 2024

Description

The current global_uid_map and global_iid_map are reset when building an evaluation. https://github.com/PreferredAI/cornac/blob/f2d44cec7272f01d344c007312d51bc3644968b9/cornac/eval_methods/base_method.py#L646C36-L646C36

Expected behavior with the suggested feature

We use the dictionary global_uid_map or global_iid_map if provided instead of rebuilding the dictionary.

Other Comments

@tqtg
Copy link
Member

tqtg commented Jan 9, 2024

Could you give an example on why we want to build an eval method from pre-built uid/iid maps, where train/val/test datasets provided?

@lthoang
Copy link
Member Author

lthoang commented Jan 10, 2024

@tqtg Taking Streaming Session-based Recommendation (SSR) scenario as an example, a dataset is split chronologically with ratio 60:40. The latter 40% is then split into 5 folds (8% each). Training on the first 60% will be validated on the first 8% then tested with the next 8%. After that, the last test will be included as training data and will be tested with the next 8%. This process will be repeated until the last fold is tested.
Given the training data growing over time, the increasing number of items make it difficult for comparison across the test folds.

If global_iid_map is given, the evaluation process will rank the same number of items. Hence, it helps us compare the performance of difference metrics across test folds. SSR implementation also specified the number of users and items here.

@tqtg
Copy link
Member

tqtg commented Jan 10, 2024

Looking at the way they do evaluation in the paper, I don't think it can be simply accommodated. After each step, they don't retrain the model but only fine-tune with the additional 8% of test data from previous step. We don't have clear path to support it yet. Let's take a step back and think about the whole evaluation scheme first before trying to fix this small thing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants