You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently gtf_scan returns the attribute column as a list of structs that each have two fields ('key' and 'value'). Unfortunately, this setup makes it hard to unnest this data in a tabular format. Having the data either fully unnested or in a more helpful structure would be more convenient.
For example, it might make more sense to have the attributes listed in a single struct, where the field names equal the 'key' values, making it possible to retrieve an item using polars directly ('e.g. pl.col('attributes').struct.field('gene_id')). Now, the list has to be iterated and searched for a specific key, denoting an expensive operation.
Lastly, when multiple values for a specific key (e.g. tag) are present (listed as a list ccds, basic), these are currently divided in their own struct tuple.
The text was updated successfully, but these errors were encountered:
Currently gtf_scan returns the attribute column as a list of structs that each have two fields ('key' and 'value'). Unfortunately, this setup makes it hard to unnest this data in a tabular format. Having the data either fully unnested or in a more helpful structure would be more convenient.
For example, it might make more sense to have the attributes listed in a single struct, where the field names equal the 'key' values, making it possible to retrieve an item using polars directly ('e.g.
pl.col('attributes').struct.field('gene_id')
). Now, the list has to be iterated and searched for a specific key, denoting an expensive operation.Lastly, when multiple values for a specific key (e.g.
tag
) are present (listed as a list ccds, basic), these are currently divided in their own struct tuple.The text was updated successfully, but these errors were encountered: