-
Notifications
You must be signed in to change notification settings - Fork 65
Description
IEEE 754 has the unfortunate consequence that floating point numbers don't satisfy x*0 == 0. That x*0 == 0 holds is fundamental to the decoupling between the symbolic and numerical computations for sparse matrices which is arguably one of the most important optimizations for sparse matrix algorithms. Right now, Julia's sparse linear algebra code uses the sparsity pattern optimization extensively and is therefore not IEEE compliant, e.g.
julia> eye(2)*[NaN,1.0]
2-element Array{Float64,1}:
NaN
NaN
julia> speye(2)*[NaN,1.0]
2-element Array{Float64,1}:
NaN
1.0Another example is
julia> (-eye(2))[1,2]
-0.0
julia> (-speye(2))[1,2]
0.0
Notice that IEEE compliance for the latter example (with the current sparse matrix implementation) would require storing 2*n^2 + n + 1 instead of 3n + 1 elements. This is also known as MemoryError.
The main question in this issue is to choose one of the two options below for sparse matrices (and probably all types with structural zeros)
x*0 == 0for allxx*0 != 0for somexaka IEEE 754
where 0 is a structural zero. I think that 1. is the best choice since I believe the symbolic/numerical decoupling is very important when working with sparse matrices and I'm not sure if 2. desirable because of the computational and memory costs or if it is possible to achieve at all. I.e. one thing is that it would be a lot of work to handle the NaN and Inf cases in all of our sparse code and that it is unclear who would deliver the hours to do so but what about sparse direct solvers? They optimize over the sparsity pattern without considering NaNs or Infs and I'm not sure how we could handle that case. In theory, we could also choose 2. for some linear algebra functions and 1. for others but I think that would be more confusing and error prone.
However, we'd need to figure out the implications of going with 1. Should we consider throwing more instead of creating NaNs and Inf during sparse calculation, e.g. division with zero? What about sparse broadcast which now (I believe) follows IEEE 754? Since broadcast allows all kinds of operators, things are much more complicated there. If sparse broadcast behaves differently from sparse linear algebra, we'd need to be careful when using broadcast for convenience in the linear algebra code (but I think that would be pretty simple to handle.)
Finally, After discussing this issue in a couple of PRs, I'm pretty confident that we won't be able to reach a unanimous conclusion so when we think we understand the consequences, we'll probably need a vote. Otherwise, we'll keep discussing this everytime a PR touches one of the affected methods.