-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sparse-constructor does not always purge zeros #9928
Comments
If handling this case requires a lot of extra work and reallocations, I wouldn't say this is necessarily a bug. |
What do Matlab, Octave, or SciPy do here? |
I think matlab never stores explicit zeros. Certainly removes them for this case. SciPy is in the other camp and never removes zeros unless if explicitly told with Looks like Julia is in the middle. |
Is it just me or does being in the middle seems like the worst of both worlds? Both of the other behaviors are easy to predict and understand. When you currently ask "does this zero end up stored", the only answer we can give is "it depends". |
To some extent, yes. I think the idea at the moment is we are trying to have a high-level, user-friendly interface for sparse matrices that does remove zeros, and a low-level, high-performance way of working with sparse matrices for experts and library authors that doesn't. If we're going to try to pick a direction and move more consistently one way vs the other, I'd very much prefer going more in the SciPy direction than the Matlab direction. |
Yes, I agree, SciPy direction would be better. Also would align with Julia's philosophy of not being too magical. A bit off topic, here what happens in SciPy if one assigns to a not allocated spot:
|
I'm not really bothered by the behavior because zeros are harmless, but I also prefer the SciPy approach. |
They are not completely harmless. Stored zeros will make some of our solvers trip, IIRC. We do need to remove these zeros in the default user-facing APIs. If someone wants to use stored zeros, I think it is reasonable to expect that they will work with the CSC data structure. |
I don't think removing them automatically is the right answer here. If there are a handful of places where they cause problems, then those call sites should explicitly canonicalize and remove stored zeros on entry, and these limitations should be documented. |
Also agreed that SciPy-like is the way to go here, and +1 for @tkelman's proposal. |
What do stored zeros mean here? In the example presented by @mauro3, did the user intend to have a stored zero, or expected that the zero be removed? The current sparse matrix design removes zeros that may appear as a result within an operation, much like Matlab and octave. There are cases like this one, where the stored zero is unintentional, and should be considered a bug in the current design. If we choose to go the SciPy route, then we could decide not to check for zeros that may appear in the course of an operation, and that they become stored zeros. In such a case, I am guessing that a lot many users are familiar with the Matlab/Octave design, but developers like the SciPy choice, as it is much more flexible. I personally would be ok if we all want to change to the SciPy style. However, calling the current behaviour ok, because we allow for stored zeros is confusing, as @StefanKarpinski notes. It should be well defined, consistent across all operations, and predictable. |
As an end user, I want to throw my 2 cents in: definitely confusing that I can't figure out whether to expect nzval's to always be non-zero or not. As pointed out in my email to the julia-users google group, this behavior is particularly confusing to an end user (especially that the help for nnz and countnz imply that they are equivalent for sparse matrices) help?> nnz A = spdiagm(ones(5)); |
Part of the problem is when you say We might need either different ways to say "0" where one means stored and one means non-stored. Or different behaviors for different methods of assignment, which is what we have now. The documentation should certainly be clearer that a "stored nonzero" can have a value equal to 0. |
@zoohair, everyone wants something different from sparse matrices, so it would help to hear a bit about what you're trying to do with them. Currently, sparse matrices will only have stored zeros if the user explicitly puts them there. |
Sure. I'm using sparse matrices to store transition probabilities T in an MDP. The state space is huge, but only few transitions are possible. When it comes to solve the MDP there's a matrix-vector T*v multiplication to be done (v is not sparse). This is done one row at a time for algorithmic purposes (speeds up convergence). In the past, I have simply use dot() and passed it the appropriate row of T and v, but I have an application where the values in v are repeated (i.e. v = [v1, v1, v1, v1, ...]), and rather than storing all of v, I want to just store the subvector v1 and still carry out the dot product. So I just want to iterate over the relevant columns of T and index into v1 appropriately to carry out the dot product. So if there are zeros stored in the matrix, it doesn't make a difference since they won't change the result. But the fact that in order to get the non-zero indices findn needs to check all entries to be != 0 seems inefficient to me... I'm find if findn returned indices to the "structural nonzeros", but of course I can see how this would be an issue for other users who expect the entries to be non-zero. |
Could you post some pseudocode on how you plan on extracting a row using |
See also the discussion of |
@mlubin: I realized that Julia is using CSC instead of CSR, so finding the rows for a given column is easier. So I'm just going to store the transpose of the matrix I care about and use this instead:
|
@zoohair, yes that seems sensible. Be sure to avoid accessing elements as |
To fix this, we just need a pass to remove any stored zeros in the |
Don't we alread have Line 337 in 3aed3f5
We also have |
Thanks. I was doing this at an airport with my laptop running out of charge. For now, I'll incoprorate this, and the larger discussion on stored zeros can continue elsewhere later. |
closed by #15242 |
Sparse constructor
sparse
should purge all zeros but it does not when combining two entries to zero:expected would be
The text was updated successfully, but these errors were encountered: