-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix #14111 dividing sparse vector by constant should give sparse vector #14963
Conversation
This shouldn't throw an error for divide by 0 or nan. It should create a SparseVector with dense data. |
It seems like a bad move to unpredictably create a really dense vector. |
It's not unpredictable, it's what the user asked for when they divided by 0. I would say it's what we do for dividing a sparse matrix by 0, but it looks like that regressed. edit: nope, it's always been wrong. |
This could be fixed by allowing custom (i.e. not only zero) values for sparse entries. |
Sure, but the majority of use cases are going to call for a zero default. Don't think making a few corner cases a little cleaner is worth imposing dealing with a potentially non-zero default for all users of the Multiplication (and arguably |
I'm not really concerned about the corner case of non-zero sparse entries, but about the fact that another corner case (dividing by zero or multiplying by a non-finite value) makes it impossible for division and multiplication by a scalar to return a sparse vector/matrix. Not saying that's the best solution though, but returning a dense vector/matrix sounds silly. |
If you can suggest any other ways of ensuring |
For example, |
>> speye(4) ./ 0
ans =
(1,1) Inf
(2,1) NaN
(3,1) NaN
(4,1) NaN
(1,2) NaN
(2,2) Inf
(3,2) NaN
(4,2) NaN
(1,3) NaN
(2,3) NaN
(3,3) Inf
(4,3) NaN
(1,4) NaN
(2,4) NaN
(3,4) NaN
(4,4) Inf |
Personally, I think this PR gets it right. If you want to divide a sparse matrix by 0, convert it to full first. |
nvm, I see what you're doing in #14973. Carry on. |
scipy.sparse has a set of warnings for "slow things you shouldn't do on sparse matrices" that we could maybe start adopting, but erroring seems like it might be asking for trouble if a value only ends up exactly equal to zero partway through some long calculation. |
Raising an error depending on a value being exactly 0 is bad, but it isn't worse than returning a dense matrix, which might even trigger OOM if it's really large. And isn't type-stability in Base a must? |
Yes. See the approach in #14973 - it does maintain type stability by always returning edit: we don't error for |
true but I think no one would expect .+ to keep the matrix sparse whereas mul/div is a lot more likely to be a programming error |
Are there any other examples where we currently throw value-dependent errors when the correct result is representable (just expensive to construct) within the domain of the desired output type? I don't think it's our place to guess the user's intent about why they are dividing by zero or multiplying by Inf or NaN. |
I think |
That seems to be a slightly different case, as there you get more information by lifting the desired output type to a different domain where you get a more precise correct result (NaN is the closest-to-correct possible representation of an imaginary number within a real data type, but it's a lot less correct than an imaginary number represented in a complex data type). Here you don't gain anything by being pedantic and erroring to tell the user to manually add |
I'd much rather have the program error out than try to allocate way too much memory, start swapping and become unusable. Yes, OSes should be smart enough to avoid that, but the reality is they're not. If I really want to generate a full matrix, then I can explicitly ask for that by converting to full first. |
Representing a full matrix in sparse format is only ever a constant factor more expensive than manually doing the conversion to full. Where else do we refuse to calculate something because it's inconveniently inefficient at a specific value? We don't error for addition of a sparse matrix by a constant, or taking the cosine of a sparse matrix, etc. We do the conversion to full when the operation the user asked for requires it. If we can give a type domain MethodError then I often agree a MethodError is preferable to a slow dense fallback, but this is value domain and we don't usually throw this kind of exception for operations when we can represent the same result but just don't want to. |
My way of thinking here is that if one is serious about sparse matrices, one cannot assume that it will fit into memory after converting it to a dense format. Triggering OOM is painful and may have serious consequences like making debugging painful, creating problems for unrelated processes, making a machine unresponsive, annoying other users of a shared machine, etc. Also, tests may succeed when done on relatively small matrices, but pathological failures might happen on production when using bigger data sets. Raising an error is clearer, and the message can indicate the "work around". |
If the actual calculation returning in a sparse format would trigger an OOM, converting to full is almost guaranteed to do the same. So we'd be giving an error that suggests doing something that is usually not going to work either, in the OOM case. If you're serious about sparse matrices then there are lots of operations you as a user shouldn't do. As a language we shouldn't be erroring to prevent users from doing them though. Warnings maybe, but erroring is very much in the "language designer knows better than user" direction that I wouldn't think we'd want to go in. |
All of our numeric promotion rules are driven by this exact concern.
But this completely predictable from the types of the arguments, not their values. If you add a constant to a sparse matrix, you always get a dense answer; likewise, if you take a cosine of a sparse matrix you always get a dense answer. Suppose you're doing a sparse computation as part of a service. It's way better to get an easily trapped, clear error if the divisor ends up being 0 here than to suddenly have your entire service crash because of mysterious OOMs. The situation is mildly better when doing an interactive computation, but only mildly – getting an unpredictable OOM still sucks. |
Could you explain what you mean here? I don't see how this is the case, particularly not in the value domain. The closest comparison I can see is integer divide by zero, but the result isn't representable within the same type there. If we had a lot of other cases of throwing divide by zero exceptions for floating point arrays (or integer overflow, etc) I'd have a different opinion here. I do a lot of sparse linear algebra every day, and have never wanted it to throw more errors. A library's job is to give the result the user asks for, whenever possible. |
This is what I mean about promotion rules. We started out promoting everything to I get why |
Unpredictably allocating huge amounts of memory is a particularly nasty form of unpredictable behavior in applications since there's a very strong chance of it causing an entire machine to go down, not just one task. Throwing an error will only cause one task to fail – in an easily recoverable and debuggable way. Note that if you do |
There's a danger that if |
I don't follow how the integer promotion applies here? Sparse matrices are known for having lots of operations that can unpredictably allocate large amounts of memory, not all of which can be determined just from the types of the inputs. It's not appropriate to error for trying to take the unpivoted cholesky factorization of an arrowhead matrix, is it? Or spmm with some especially dense columns/rows? |
You asked about design decisions where we sacrificed mathematical correctness for the sake of not allocating too much memory. Integer promotions are an example. I'm not going to fight this – I've made my case. |
Oh I see. That wasn't my question then, I was asking about value-domain errors. We do get a few more InexactErrors now... Predictable memory use is something that sparse matrices can't achieve for all possible inputs and all possible operations (even the ones that you expect to preserve sparsity most of the time). Much of the interesting work in the field is all about doing reorderings and partitionings to reduce fill-in, but the fact that there are degenerate expensive cases is unavoidable. |
Yes, I think even if we want the policy "error on operations that blow up sparse matrices", the only reasonable way to implement it is with OutOfMemoryError. There isn't a general version of such a check. I also think there are good reasons to have Int8 + Int8 = Int8 other than memory used by array operations. |
There are, but this is still the history of how these operations worked and why we changed them. |
I thought the real purpose of NaN in IEEE-754 was for deferred exception detection? I'm not sure I understand why it is important that this doesn't throw a DivideByZero error but instead is able to generate a dense array of |
Giving errors instead of NaNs is generally defensible, but as you observe we'd have to decide to do that consistently everywhere. On the other hand 1/0 == Inf is not really an error in the same way, and does the right thing in some computations. |
The specific method discussed here has perfectly predictable memory use if |
I agree with @tkelman here. If there's a well-defined answer, we should just return it. Using too much memory, or unpredictable amounts of memory, is not an error unless you actually run out, which is the domain of OutOfMemoryError (which unfortunately we need to hack on more, since OSes have declined to handle the issue in a sane way). |
Sorry about the volume of debate on your first PR @pgentili! I'll continue on #14973 for non-finite |
Is the idea of supporting non-zero sparse entries (#10410) really out of question? It's been mentioned several times in the past (for example here where @StefanKarpinski referred to another discussion I can't find). I must say I have no particular need for this myself, but it would solve all the issues discussed here, as well as (If that's a "no", please do not let my question derail this already long discussion -- debates can go to #10410.) |
The discussion about computations producing NaN is not new - see #5234. From what I can tell, the only inkling of a rule we have in Julia is
There's really no reason to hold up this PR for literally an edge case, so I would be in favor of accepting a PR that checks its inputs. In this case, I would prefer Note also that this PR makes julia> scale(sparsevec(zeros(5)), Inf)
Sparse vector of length 5, with 0 Float64 nonzero entries: |
Yes, scale gets this wrong and I plan on fixing it, see my todo in #14973.
|
No description provided.