Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding filtering features to task listing #170

Merged
merged 8 commits into from
Nov 10, 2022
Merged

Adding filtering features to task listing #170

merged 8 commits into from
Nov 10, 2022

Conversation

kellrott
Copy link
Member

Responding to #57 and porting changes from #104

@kellrott kellrott added this to the 1.1 milestone Nov 11, 2021
@uniqueg
Copy link
Contributor

uniqueg commented Mar 8, 2022

From @kellrott

One of the suggested features around TES 1.1 is the ability to filter job listings by tag key/value pairs.
Each TES Task has a dictionary where arbitrary key/values can be stored for annotating jobs ( https://github.com/ga4gh/task-execution-schemas/blob/develop/openapi/task_execution_service.openapi.yaml#L666 ), it isn't used by the server, or expected to be seen by the backend. Rather it is simply a space for users to store annotations, like workflow, step or tool ids, that may help them when that are scanning the job log for downstream logic.

The ability to filter job listing, using these tags is proposed in PR 170: #170 and specific format for these requests is done in https://github.com/ga4gh/task-execution-schemas/blob/issue/104/openapi/task_execution_service.openapi.yaml#L111

In this version, the schema uses the additionalProperties option to capture all additional non-schema flags from the URL query and stores them in an object. This doesn't seem like the best method. First, it means that there will be a subset of words that can't be used for tags, namely 'name_prefix', 'state', 'page_size' and 'page_token'. Not a big loss, but still a little less than neat. But more importantly, the 'additionalProperties' feature is one of those things that falls in the 'new and not full supported' category for a lot of code generation tools. I noticed this when trying to setup for OpenAPI-Generator based go-server code for Funnel. I worry that including this feature in the specification would break possible implementations, until the code generation gets a lot better.

Does anyone have any thoughts about a better way to support key/value filtering of tags for job listings? Is there a pattern that has been used in any of the other GA4GH APIs?

@uniqueg
Copy link
Contributor

uniqueg commented Mar 8, 2022

@kellrott: Do you have an example call with filters that highlights the namespace issue (i.e., not being able to use, e.g., state and page_size)? Just by looking at the proposed schema I would have guessed that the additional properties are nested under property tags - in which case there wouldn't be any namespace conflict.

As for wanting to avoid additionalProperties: I think that the other Cloud APIs either do not include arbitraty key-value pairs or don't support filtering on them. TRS does filtering based on pre-defined query params - it's not particularly nide either, IMO. I don't really see a way around using additionalProperties here (at least if we want to keep this a GET endpiont, which I think is a given). So even though I share your concern, in my opinion the outlined solution is the way to go.

@kellrott kellrott changed the base branch from develop-1.1 to develop April 20, 2022 22:08
@uniqueg
Copy link
Contributor

uniqueg commented Jul 19, 2022

From @kellrott and @aniewielska: It was suggested during recent Cloud WS meeting to check out how this is done in k8s.

Reference: https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/#list-and-watch-filtering
(likely too complex, but possibly parts can be used)

@lvarin
Copy link

lvarin commented Aug 3, 2022

Another options would be to use annotations (https://kubernetes.io/docs/concepts/overview/working-with-objects/annotations/). I have seen several tools that integrate with others using annotations. For example the Let's Encrypt controller (https://github.com/tnozicka/openshift-acme) reads and writes annotations from the object it manage. I think the main advantage is that labels are often used ti link objects (Pods and services for example), and annotations are pure metadata.

(Just my two cents)

of key values, and will be zipped with an optional tag_value array.
So the query:
```
?tag_key=foo1&tag_value=bar1&tag_key=foo2&tag_value=bar2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the ordering of tag_key and tag_value params is not interleaved like in this example? I suppose the tag_key and tag_value arrays could be independently constructed and then zipped. But what if they aren't of equal length?

Also, is it possible to specify deliberately/explicitly an empty value, and if so, how? Would this be expected to work: ?tag_key=foo1&tag_value=&tag_key=food2&tag_value=bar2? And if so, how about the same for keys, would an empty key be allowed?

Finally, there aren't any guidelines on escaping characters. What if a key or value name includes characters such as ? or &?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tag_key and tag_value arrays would be constructed first, and 'zipped' together. The shorter array would be filled in with blanks and behave in the way described on line 126.

You are correct, treating empty strings as 'wildcards' means that you can't search specifically for empty strings. I'm not really sure of an alternative, but I'm not sure if this is a blocker, as you could simple search for the existence of the tag and filter values manually on the client side.

I think character escaping would work via traditional HTML escaping. IE ? would be ? and & would be &

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kellrott so for this example below what is the expected behavior?

{"foo": "", "baz": ""}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the input {"foo": "", "baz": ""} we expect that both keys foo and baz exist. The values will be ignored and all instances of tasks with both foo and baz keys defined will be returned.

This could be coded as ?tag_key=foo&tag_key=baz with no tag_value set

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. This LGTM.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine with me too :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants