-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
count(f, x) should not be equivalent to sum(f, x) #20404
Comments
i.e. define iszero(b::Bool) = !b # maybe more efficient than the fallback b == zero(b) definition
function count{F}(pred::F, itr)
n = 0
for x in itr
n += !iszero(pred(x))
end
return n
end
count(itr) = count(identity, itr)
@deprecate countnz(itr) count(itr) |
+1, that would be more consistent with |
I think I'd prefer something like function count(f, itr)
n = 0
for x in itr
if x
n += 1
end
end
return n
end
count(itr) = count(identity, itr)
@deprecate countnz(itr) count(!iszero, itr) This has the benefit of throwing an error for non-boolean data when using |
@ararslan, that might actually be slower because it forces a branch. One could do However, I'd prefer to have one function that does more rather than restrict the functionality to |
I disagree because without a predicate for non-boolean data, the name Having |
@ararslan, your version with the branch (corrected to call |
Dang. |
That's https://github.com/JuliaLang/Juleps/blob/master/Find.md, but it doesn't really address the question of whether |
Because it's unclear. I think being more explicit and saying |
Now that we can do |
Probably should continue to have a |
For sparse there's |
There's an important difference between |
…f non-boolean values are encountered (fixes JuliaLang#20404)
I'd imagine most uses of count would be for the boolean case, in which case |
@ararslan Why do you think having |
@nalimilan Good point. I think |
@nalimilan I think it's a good idea because it provides both clarity and safety in terms of knowing exactly what you're getting. I think counting nonzeros is a rather odd and surprising default. I assume the reason that we don't currently have a one-argument |
To solve the speed point, just add function fast_count(pred, itr)
n = 0
@simd for x in itr
n += pred(x)::Bool
end
return n
end a = randn(1000000)
@benchmark fast_count(x->x>1.96, a) # median time: 535.944 μs (0.00% GC)
@benchmark sum(x->x>1.96, a) # median time: 565.832 μs (0.00% GC)
@benchmark count(x->x>1.96, a) # median time: 1.370 ms (0.00% GC) |
I wouldn't have expected |
There are many SIMD instructions for integers. |
Yes, but I would expect them to be enabled by default. Do they change the behavior of the code? |
Perhaps some aliasing checks are turned off. I'm not sure, I don't get SIMD without the macro at least. |
I took a look at the
count
implementation for #20403 (cc @cossio), and was surprised to discover that the current definition is nearly identical tosum
(albeit with a type instability):For example,
count(sqrt, [1,2,3,5])
returns 6.382332347441762.We could disallow this case entirely, but one possibility would be to make
count(f, x)
equivalent tocountnz
for non-boolean data. Indeed, I'm not sure why we have bothcountnz
andcount
— couldn't we merge the two functions into a singlecount
function?The text was updated successfully, but these errors were encountered: