Improve attribute data structure for gtf_scan #177

jdcla · 2024-09-12T19:04:27Z

Currently gtf_scan returns the attribute column as a list of structs that each have two fields ('key' and 'value'). Unfortunately, this setup makes it hard to unnest this data in a tabular format. Having the data either fully unnested or in a more helpful structure would be more convenient.

For example, it might make more sense to have the attributes listed in a single struct, where the field names equal the 'key' values, making it possible to retrieve an item using polars directly ('e.g. pl.col('attributes').struct.field('gene_id')). Now, the list has to be iterated and searched for a specific key, denoting an expensive operation.

Lastly, when multiple values for a specific key (e.g. tag) are present (listed as a list ccds, basic), these are currently divided in their own struct tuple.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve attribute data structure for gtf_scan #177

Improve attribute data structure for gtf_scan #177

jdcla commented Sep 12, 2024 •

edited

Loading

Improve attribute data structure for gtf_scan #177

Improve attribute data structure for gtf_scan #177

Comments

jdcla commented Sep 12, 2024 • edited Loading

jdcla commented Sep 12, 2024 •

edited

Loading