Stricter recycling rules #98

sebffischer · 2024-03-19T07:26:34Z

Currently, the code below works:

> c(1, 2) * c(1, 2, 3)
[1] 1 4 3

I think this should be stricter and at least throw a warning (but I would probably even tend to an error) when the shorter length does not divide the longer length.

sebffischer · 2024-03-19T07:54:42Z

Maybe one wants to be even stricter, i.e. in a operation x + y, either length(x) == length(y) or length(x) == 1 | length(y) == 1. This would be in line with the broadcasting rules implemented in numpy or pytorch.

A strong argument for being stricter is that the recycling rules for vectors are then more consistent with the recycling rules for n-dimensional arrays. For example, it should also not be possible to multiply two matrices with shapes (2, 3) and (2, 4).

dgkf · 2024-03-19T22:03:52Z

Great catch! I remember thinking about this when implementing the current recycling... and promptly forgetting about it.

This style works great for objects that have a known length... which for now should account for all objects. But my goal is that objects eventually are calculated lazily, and possibly even from generators where the length is unknown.

All operators are currently set up with this laziness in mind. They are implemented on any representation that implements IntoIterator, with the goal that eventually a "vector" will be able to be represented as a generator.

That said, we can add something that is like... try_check_recycle_size_compatibility that checks for the size when both Iterator representations have a size hint, throwing an error if both sizes are known but incompatible, but otherwise silently continues with iterators of unknown size until both iterators are exhausted - which point we will know their size and can raise an appropriate error.

sebffischer · 2024-03-24T18:52:16Z

One thing that one also has to pay attention to is to make the lazy evaluation work correctly with error handling.

In the code below (assuming stricter recycling rules), the multiplication x * y should throw an error.
However, because y is created from masking 1:3 it should have unknown size (unless we want to calculate the sum of the logical vector, which I guess we don't).
This means, that when calling tryCatch(), the length of y is unknown and one cannot perform compliance checks with the recycling rules when constructing z.
Only later, when mean() will be called on z, would we notice that something went wrong.
However, the stop() call comes before that, so the error handling code would never be executed.

y <- (1:3)[c(TRUE, TRUE, TRUE)]
x <- 1:10
z <- tryCatch(x * y,  # this should throw
  error = function(e) {
     messagef("Catch all them errors")
     NA
  }
)

stop("wupsie")

mean(z)

sebffischer · 2024-06-15T12:08:55Z

I have started to think about this again and there is another relevant case that is also relevant to error-handling.
This is mostly related to your suggestion from above:

That said, we can add something that is like... try_check_recycle_size_compatibility that checks for the size when both Iterator representations have a size hint, throwing an error if both sizes are known but incompatible, but otherwise silently continues with iterators of unknown size until both iterators are exhausted - which point we will know their size and can raise an appropriate error.

The one possible problem from above regarding the tryCatch() can in principle be solved for forcing the evaluation of the expression x * y, the same problem will also arise in other contexts I believe.

Let's say we do something like

print(1:3[ [true, true, true] ] + 1:2[ [true, true]])

With the recycling rules that I suggested, this should throw an error, as the lengths of the left and right subsets are 3 and 2.
What I believe should happen here eventually is that

1:3 and 1:2 are represented using RepType::Iterator
Subsetting the Iterator using the Mask, should adapt those iterators akin to a rust .filter(), creating two other RepType::Iterator representations.
Adding the two filtered iterators produces the final iterator
The print() call then iterates over the final iterator and starts printing [1] 1 4 but only then do we notice that an error should be thrown.

The whole problem becomes worse if we - instead of printing the final vector - subset the first element as shown below:

(1:3[ [true, true, true] ] + 1:2[ [true, true]])[1]

instead of printing the first two elements and then throwing an error, this operation will never notice that the addition of the two masked vectors is an invalid operation according to the stricter recycling rules, because it will only yield the first element of the final iterator.

So if we want both the stricter recycling rules and usage of RepType::Iterator, I think we always need to verify the validity of a vector representation (such as RepType::Iterator) when creating the representation, not later when yielding its elements.

sebffischer · 2024-07-28T06:08:43Z

@dgkf Do you agree that these examples seem to make it necessary to always have a known length?
I would maybe try to work on that next

dgkf added theme-internals Relates to internal operations of the language type-design Discussion regarding design of enhancements or project at large meta-proposal Language proposals status-feedback-needed Needs further discussion before moving further labels Mar 19, 2024

sebffischer mentioned this issue Mar 26, 2024

Bug in len() implementation / Implementing the length() primitive #105

Open

sebffischer mentioned this issue Oct 15, 2024

Enforce stricter recyling rules #201

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stricter recycling rules #98

Stricter recycling rules #98

sebffischer commented Mar 19, 2024

sebffischer commented Mar 19, 2024 •

edited

Loading

dgkf commented Mar 19, 2024

sebffischer commented Mar 24, 2024 •

edited

Loading

sebffischer commented Jun 15, 2024

sebffischer commented Jul 28, 2024

Stricter recycling rules #98

Stricter recycling rules #98

Comments

sebffischer commented Mar 19, 2024

sebffischer commented Mar 19, 2024 • edited Loading

dgkf commented Mar 19, 2024

sebffischer commented Mar 24, 2024 • edited Loading

sebffischer commented Jun 15, 2024

sebffischer commented Jul 28, 2024

sebffischer commented Mar 19, 2024 •

edited

Loading

sebffischer commented Mar 24, 2024 •

edited

Loading