Differentiate between one-row and multi-row structs #3

krlmlr · 2024-02-19T22:04:30Z

With a different class. For the former, $ into list columns would unpack, for the latter, we could offer a generic accessor function that selects according to equality with the first column.

The text was updated successfully, but these errors were encountered:

krlmlr · 2024-02-19T22:44:38Z

It's only useful if autocomplete actually picks it up.

moodymudskipper · 2024-02-20T10:13:17Z

Is the auto-unnesting $ worth it ? This comes with hidden surprises, to be consistent we need [[, and that means lapply() and map() behave differently (lapply() uses [[, map() iterates on true elements) low level. df$nested_col[[1]] will return the first column of the nested df to the surprise of the user expecting regular df behavior, which might occasionally translate into silent bugs.

This accessor according to first column looks a lot like row names.

krlmlr · 2024-02-20T10:37:25Z

Do we really need [[ to be consistent with $ ? Let's first see if autocomplete would pick it up, before continuing the discussion.

Databases don't have row names, tibbles will never have them.

moodymudskipper · 2024-03-07T09:59:34Z

Now we have list_structs, that are lists, and there is no need for unpacking or unnesting.
And we have tibble_structs, that have rows we can extract into list_structs.
A problem is that we don't have packed columns anymore. To have packed columns in tibble_structs we'd need a new class in list_struct so we can have the round trip, similar to how we have a "scalar" class to signal that we don't want to nest the value when going from list_struct to tibble_struct.

krlmlr · 2024-03-07T10:33:23Z

A reprex would be nice ;-)

moodymudskipper · 2024-03-07T11:22:56Z

Yes, and my last statement about packed columns was misguided too, here's a summary:

List_structs classed lists, stricter, and print like one row dfs with some custom pillar methods.

The printing method might be improved, and maybe the best is to display an improved tree
(something similar to str, but tailored for structs, we can also have a glimpse method)

library(struct)
foo <- list_struct(
  a = scalar(1),
  a2 = 1, 
  b = tibble_struct(c = 2, d = 3), 
  e = list_struct(f = 4, g = 5)
)
foo
#> # list_struct object: 4 element(s)
#>   a            a2        b                  e             
#> * <dbl>        <dbl>     <tbbl_str[,2]>     <named list>  
#> 1 <scalar [1]> <dbl [1]> <tbbl_str [1 × 2]> <lst_strc [2]>
print_tree(foo)
#> █─ foo <lst_strc>
#> ├─── a <scalar>
#> ├─── a2 <dbl>
#> ├─█─ b <tbbl_str[,2]>
#>   ├─── c <dbl>
#>   ├─── d <dbl>
#> ├─█─ e <lst_strc>
#>   ├─── f <dbl>
#>   ├─── g <dbl>

We can bind list_structs into tibble_structs, scalars are not nested, the rest is nested.
We need some pillar methods here to differentiate for a standard tibble.

bar <- bind_structs(foo, foo)
class(bar)
#> [1] "tibble_struct" "tbl_df"        "tbl"           "data.frame"

# this tibble can be subset normally
bar[1,]
#> # A tibble: 1 × 4
#>       a a2        b                  e             
#>   <dbl> <list>    <list>             <list>        
#> 1     1 <dbl [1]> <tbbl_str [1 × 2]> <lst_strc [2]>
bar[[1]]
#> [1] 1 1

To go back to a struct we need to use extract, where extract is evaluated in bar and returns an integerish or logical

identical(bar[extract = 1], foo)
#> [1] TRUE

note we have autocomplete after bar[extract = 1]$

bar is not correct at the moment I think, nested list_structs should be changed to packed columns, not nested, we should have

#> A tibble_struct: 2 × 4
#>       a a2        b                    e$f    $g
#> * <dbl> <list>    <list>             <dbl> <dbl>
#> 1     1 <dbl [1]> <tbbl_str [1 × 2]>     4     5
#> 2     1 <dbl [1]> <tbbl_str [1 × 2]>     5     5

where e is a packed tibble_struct

krlmlr · 2024-03-12T11:19:17Z

Agree that e should be a packed column.

drop = TRUE instead of extract = 1 ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Differentiate between one-row and multi-row structs #3

Differentiate between one-row and multi-row structs #3

krlmlr commented Feb 19, 2024

krlmlr commented Feb 19, 2024

moodymudskipper commented Feb 20, 2024

krlmlr commented Feb 20, 2024

moodymudskipper commented Mar 7, 2024

krlmlr commented Mar 7, 2024

moodymudskipper commented Mar 7, 2024 •

edited

Loading

krlmlr commented Mar 12, 2024

Differentiate between one-row and multi-row structs #3

Differentiate between one-row and multi-row structs #3

Comments

krlmlr commented Feb 19, 2024

krlmlr commented Feb 19, 2024

moodymudskipper commented Feb 20, 2024

krlmlr commented Feb 20, 2024

moodymudskipper commented Mar 7, 2024

krlmlr commented Mar 7, 2024

moodymudskipper commented Mar 7, 2024 • edited Loading

krlmlr commented Mar 12, 2024

moodymudskipper commented Mar 7, 2024 •

edited

Loading