Add lead time to each labelled query and avg lead time for a case #1003

DmitryKey · 2024-04-16T18:58:51Z

Is your feature request related to a problem? Please describe.
Usually the process in our labelling projects starts like this:

Choose the labelling objective, set up a test project.
Distribute tasks in a group of search experts (we use Excel for coordinating which cases are taken by whom).
Label, learn-rinse-and-repeat, formulate labelling instructions.

Then, we proceed with scaling out the labelling process by involving a larger set of people (who might be domain, but not search, experts).

At this point, knowing what is the avg lead time per unit of work (a query), we know, how much of workforce to request to reach a specific deadline.

Describe the solution you'd like
A lead time is recorded on query level and is rolled up to the case level.
The lead time can be accessed via Notebook feature to perform analytics, like distribution of lead times per annotator.

Describe alternatives you've considered
Recording this manually, but this is not accurate, plus increases the complexity of labelling.

Additional context
Label Studio (https://github.com/HumanSignal/label-studio) offers a lead time feature on task level. The UI is written in react (I believe), so probably, there is a way to adopt and adapt the UI logic for Quepid.

epugh · 2024-04-29T12:36:04Z

@DmitryKey Have you looked at the Books infrastructure yet? YOu can merge various books together into new books, and then use that to populate a Case... There might be some nice operational things in there.

In terms of lead time, I'm wondering if the existing update_time and create_time fields that we create for the Judgement objects and QueryDocPair object in the database would help you intuit this? I could imagine using the python noteobok, and calling some api's to get the data, and then do some graph/charting to predict how long?

epugh · 2024-04-29T12:43:54Z

@DmitryKey I am going to be Europe from May 1 to May 8th, so we could pair on this analysis together if you want.

Right now on the home page we have this messaging:

What if we could predict how long till all ratings are done?

DmitryKey · 2024-04-29T19:57:01Z

Hey @epugh !
Great to hear that, let's be in contact regarding the pair up - would love to!

I think update_time and create_time fields are a good start. Some thoughts:

What is the definition of done for rating a query? Is it that all documents were rated? Or a period of inactivity?
If an annotator revisits a particular query / document, does it mean, that we should update the lead time with the observed time it took?

Predicting ETA for ratings - is a fantastic feature to have. It could start with a simple retrospective prediction ("so far it took this long per query / case, so we can make a prediction"). However, this may vary per annotator, but should probably converge on average across all annotators.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add lead time to each labelled query and avg lead time for a case #1003

Add lead time to each labelled query and avg lead time for a case #1003

DmitryKey commented Apr 16, 2024

epugh commented Apr 29, 2024

epugh commented Apr 29, 2024

DmitryKey commented Apr 29, 2024

Add lead time to each labelled query and avg lead time for a case #1003

Add lead time to each labelled query and avg lead time for a case #1003

Comments

DmitryKey commented Apr 16, 2024

epugh commented Apr 29, 2024

epugh commented Apr 29, 2024

DmitryKey commented Apr 29, 2024