Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement the MUNI/33/0769/2022 R&D project #8

Merged
merged 67 commits into from
Mar 20, 2023
Merged

Implement the MUNI/33/0769/2022 R&D project #8

merged 67 commits into from
Mar 20, 2023

Conversation

MarekToma
Copy link
Contributor

@MarekToma MarekToma commented Feb 27, 2023

This pull request merges all changes proposed in #1, #5, #6, and #7 into the main branch.

MarekToma and others added 4 commits February 27, 2023 15:17
* added evaluation metrics module

* added bpref metric

* added unit tests for evaluation_metrics module

* fixed some style and type errors

* fixed some type errors

* fixed type error

* made the requested changes to EvaluationBase and added to it (and its childred classes) an option to choose evaluation depth

* fixed some style errors

* removed unused method from EvaluationBase

* removed unit tests for no longer existing functions

* added checks to avoid division by zero
* added ensembles module

* fixed some typos and style checks

* added rbc ensamble, and updated setup.py to include sci-kit-learn
* Create and push Docker tags

Install htop

Load questions in script.download_datasets

Only download text collection for ARQMath in scripts.download_datasets

Create root directory with mode 777 in scripts.download_datasets

Add" notebooks" extra requirements to setup.py

Add text+tangentl math format for ARQMath

Increment patch version of pv211-utils

Fix syntax error in setup.py

Fix style error in script.download_datasets

Fix test.arqmath.test_loader.TestLoadQueriesTangentL

Update gdown in Jupyter notebooks

Fetch tags when building Docker image in CI

Replace Google Drive with HTTP

Do not redownload TREC and ARQMath datasets if they exist

Add md5_also_ok to manifest files to allow for two different versions of a dataset

Do not download gdown in Jupyter notebooks

Support creating fat Docker images out-of-box

Update links to collection processing notebooks

added arqmath3 judgements,created datasets module, and added arqmath class to datasets module

changed and expanded arqmath class, added crenfield, and trec class, added docstring, fixed code convention issues

added beir dataset interface, shuffled queries for arqmath and creinfield (added file with the order), and fixed some bugs and naming inconsistencies

added two blank lines to imports to fix stylecheck

deleted misscommited file

small style change

sync with main

return type change in cranfield loader

fixed some bugs in TrecDataset

Initial commit, migrated files from gitlab

Corrected selected style errors

Replaced google_drive_download with http_download to match master

Added a rudamentary example ipynb notebook for the CQADupStack datasets of the BEIR collection.

Added some basic tests

Fixed some basic style errors and added more tests

Resolved a bug with id collisions on dataset combination, added a Google Sheet leaderboard, added a proper train/dev/test split, plus some minor changes to example beir notebook.

Added sorting to desired dataset input to prevent any unwanted randomness

Minor corrections to beir loader and added proper permissions to leaderboard service account

Added the ability to prevent unnecessary download of data when data is already present.

Resolved some type errors

Added the ability to prevent repetitive download of data when data is already present. ver2

Added the default download location.

Updated actions to use Node.js 16

fixed some typecheck errors

put a file back

Don't specify type of `query` in `irsystem.IRSystemBase`

Update setup.py

fixed some style checks

fixed some type errors

reverted changes to main.yml

fixed nq train set loading

reverting some mistakes

fixed some type errors

reverted changes to main.yml

fixed nq train set loading

reverting some mistakes

reverted some changes

* fixed bugs in combine beir datasets and beir eval and removed some redundant modules and updated beir notebook

* extended beir.loader tests to cover datasets combining and splitting

* fixed typechecks in test/beir/testloader

---------

Co-authored-by: Vít Novotný <[email protected]>
* add full doc preprocessing

* systems preprocessing

* fix code style

* add math preprocessing

---------

Co-authored-by: MarekToma <[email protected]>
@MarekToma MarekToma marked this pull request as draft February 27, 2023 15:23
@Witiko Witiko changed the title Developer Merge the implementation of the R&D project MUNI/33/0769/2022 Feb 28, 2023
@Witiko Witiko self-requested a review February 28, 2023 21:39
@Witiko Witiko changed the title Merge the implementation of the R&D project MUNI/33/0769/2022 Implement the MUNI/33/0769/2022 R&D project Feb 28, 2023
pv211_utils/arqmath/eval.py Outdated Show resolved Hide resolved
pv211_utils/beir/entities.py Outdated Show resolved Hide resolved
pv211_utils/beir/entities.py Outdated Show resolved Hide resolved
pv211_utils/beir/eval.py Outdated Show resolved Hide resolved
pv211_utils/beir/eval.py Outdated Show resolved Hide resolved
pv211_utils/beir/loader.py Outdated Show resolved Hide resolved
pv211_utils/beir/loader.py Outdated Show resolved Hide resolved
pv211_utils/ensembles.py Outdated Show resolved Hide resolved
pv211_utils/eval.py Outdated Show resolved Hide resolved
Witiko added 2 commits March 20, 2023 18:44
…processing`"

This reverts commit 34a4bb2ece523e9a944d3a1c804a5602b1c2753d.
@Witiko Witiko marked this pull request as ready for review March 20, 2023 17:46
Copy link
Member

@Witiko Witiko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks serviceable. Many thanks to @MarekToma and @VojtechKalivoda for fixing the biggest issues. I opened tickets for non-critical issues: #9, #10, #11, #12

@Witiko Witiko merged commit 0d75206 into main Mar 20, 2023
@Witiko Witiko deleted the developer branch March 20, 2023 18:17
@MarekToma
Copy link
Contributor Author

MarekToma commented Mar 20, 2023

@Witiko I just made the change to make eval metrics faster (according to your suggestion )maybe we should merge that one too :D (sorry for being late, I havent noticed you merged it in mean time)

@Witiko
Copy link
Member

Witiko commented Mar 20, 2023

@MarekToma Thanks, but we'll have to do that one later. Please, feel free to file it as a pull request that closes #12. I expect to livepatch some of the smaller issues such as this one after the release of the second term project assignment. There is no big hurry as long as num_workers defaults to 1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants