You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a simple use case where I want to pull down all rows of a column from a lance dataset, do some custom processing locally, and add the resulting rows (same length) directly back up to the same lance dataset. (The local processing involves some windowing operation which is why the straight SQL update syntax won't work).
It looks like the most supported way to do this is via the merge syntax, where I can feed in a precomputed data frame and join on index columns. This is great and all, but I couldn't figure out how to do this without manually generating some custom index column that basically ends up being the same as a _rowaddr anyways, except actually manifested in the dataset instead of being a meta column.
It seems strange to me that such a simple operation as adding a column with the same row count as the existing dataset is so complicated, but at least if _rowaddr is exposed to merge operations than it saves the inconvenience of generating a redundant row idx column manually.
The text was updated successfully, but these errors were encountered:
I have a simple use case where I want to pull down all rows of a column from a lance dataset, do some custom processing locally, and add the resulting rows (same length) directly back up to the same lance dataset. (The local processing involves some windowing operation which is why the straight SQL update syntax won't work).
It looks like the most supported way to do this is via the
merge
syntax, where I can feed in a precomputed data frame and join on index columns. This is great and all, but I couldn't figure out how to do this without manually generating some custom index column that basically ends up being the same as a_rowaddr
anyways, except actually manifested in the dataset instead of being a meta column.It seems strange to me that such a simple operation as adding a column with the same row count as the existing dataset is so complicated, but at least if
_rowaddr
is exposed tomerge
operations than it saves the inconvenience of generating a redundant row idx column manually.The text was updated successfully, but these errors were encountered: