Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update set method for sparse to only add non-zero elements #145

Open
wants to merge 2 commits into
base: SNAPSHOT
Choose a base branch
from

Conversation

Lundez
Copy link

@Lundez Lundez commented Jul 27, 2021

I think it makes sense to disregard irrelevant values for the sparse matrices. I first noticed this issue when using another library (kmath) which has a buildMatrix method they yet have to specialize for sparse structures.
This buildMatrix would be much better off if the set method never applied "0" elements. They risk of doubling the array-size way larger than required would still exist, but at least we wouldn't make sure that it happens 馃槈

The best approach forward is for kmath to better support sparse structures, but this step in the right direction would be helpful for not only kmath-user but also ejml ones 馃槃

Before submitting this PR, please make sure:

  • [-] Your code is covered by tests (something about memory breaking, not sure what...)
  • You ran ./gradlew spotlessJavaApply

@Lundez Lundez marked this pull request as ready for review July 27, 2021 15:25
@lessthanoptimal
Copy link
Owner

@Lundez thanks for the PR! Going to need to put some thought into this and make sure there are no unintended consequences. This might have also come up once before but I can't seem to find the PR or discussion. For example a potential negative is that, maybe someone wants to pre-initialize the sparse structure and now they need to jump through hoops and this is an unexpected behavior. In my personal real-world applications this isn't useful since I'm not assigning values unless it's non-zero (e.g Jacobian of a large sparse system). As an example of untended consequences, libraries which mark matrices as transposed instead of transposing them to save time/memory basically just end up with much more complex slower code and dubious memory saving.

Are there any more established sparse linear algebra libraries which have this as their default behavior? E.g. SuiteSparse

@FlorentinD @szarnyasg Any thoughts on this as well?

@szarnyasg
Copy link
Contributor

@lessthanoptimal I think this is related to an earlier discussion in the SuiteSparse:GraphBLAS repository: GraphBLAS/LAGraph#28

It's quite a lengthy one, so the issue of explicit zeros is not a trivial one. SuiteSparse:GraphBLAS can yield explicit zeros in some cases and there are also ways to drop them (e.g. with GxB_select).

Copy link
Owner

@lessthanoptimal lessthanoptimal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is going to work as intended. Once a non-zero element is already in the graph structure you want to assign it the new value no matter what. If the element is not in the graph structure and the value being assigned is zero you can safely not add it.

@@ -169,7 +171,7 @@ public void unsafe_set( int row, int col, double value ) {
int index = nz_index(row, col);
if (index < 0)
addItem(row, col, value);
else {
else if (value != 0.0) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already in the graph structure so you want to update the value.

@@ -169,7 +171,7 @@ public void unsafe_set( int row, int col, double value ) {
int index = nz_index(row, col);
if (index < 0)
addItem(row, col, value);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not in the graph structure so you could check to see if it's zero here and not add it.

@@ -231,7 +231,9 @@ public void set( int row, int col, double val ) {
if (row < 0 || row >= numRows || col < 0 || col >= numCols)
throw new IllegalArgumentException("Outside of matrix bounds");

unsafe_set(row, col, val);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comments for triplet below.

@lessthanoptimal
Copy link
Owner

@szarnyasg Thanks for linking to that discussion!

Thanks to @FlorentinD you can remove zero elements too using CommonOps_DSCC.select()

@FlorentinD
Copy link
Collaborator

Greetings to Lund @Lundez

Personally, for a graph context, I favor having the possibility of 0 weights.

Maybe we could provide another method that does not set zero elements, instead of changing existing behavior?

@Lundez
Copy link
Author

Lundez commented Jul 29, 2021

Greetings to Lund @Lundez

Personally, for a graph context, I favor having the possibility of 0 weights.

Maybe we could provide another method that does not set zero elements, instead of changing existing behavior?

Could make sense, or doing it the other way around.
@lessthanoptimal any preference? If you still think this idea makes sense that is

Thanks for everyones input, as soon as I know which direction I should take I'll fix the comments 馃憤

@ennerf
Copy link
Contributor

ennerf commented Jul 29, 2021

I'll chime in here as well. I've tried this optimization a few times, but for my use cases the extra branching always ended up costing more than I got from any subsequent benefits. There may be cases where it's worth it, but I'd vote against making it the default behavior.

@lessthanoptimal
Copy link
Owner

@ennerf thanks for that insight! What matrix size are you dealing with again? Curious if this could be replicated in a benchmark.

@lessthanoptimal
Copy link
Owner

@Lundez Right now I'm leaning towards having a distinctive setter that filters out zeros. That will also have a lower bar for being added as it can't potentially break a ton of code. Also forgot to mention in the PR, unit tests are missing.

Unrelated to this PR are you a contributor to kmath? I might have some performance suggestions that could be discussed outside.

@lessthanoptimal
Copy link
Owner

For what it's worth, it looks like Matlab and Octave filter out and remove 0 when assigning.

a=sparse(10,5)
a(3,4) = 10
a(4,5) = 0
a(3,4) = 0

Octave Outputs:

Compressed Column Sparse (rows = 10, cols = 5, nnz = 0 [0%])

Compressed Column Sparse (rows = 10, cols = 5, nnz = 1 [2%])
(3, 4) ->  10

Compressed Column Sparse (rows = 10, cols = 8, nnz = 1 [1.2%])
(3, 4) ->  10

Compressed Column Sparse (rows = 10, cols = 8, nnz = 0 [0%])

@Lundez
Copy link
Author

Lundez commented Jul 29, 2021

@lessthanoptimal makes sense 馃憤

I'm not a contributor, but I guess I could get a PR up.
Currently I've added my own methods on top of their lib to optimize the usage of sparse matrices (such as defining size & only non-zero elements).
The thing I like about kmath is that it'll simplify a transition from pure JVM libraries to native if I need to make that change later for my NLP-library which will include Deep Learning at some time where I need the GPU optimizations (like they got support for ND4J & PyTorch Tensors already).

@ennerf
Copy link
Contributor

ennerf commented Jul 29, 2021

@lessthanoptimal I think the largest CSC I used to benchmark in the polynomial spline trajectory project was 9k squared with 15k entries (1k segments), but I assume that the ratio of zeros probably matters more than the raw size. As far as I remember there were very few actual zeros, but unfortunately I'm not using CSC in that project anymore and can't easily reproduce the benchmark data.

For a filtering setter it may be useful to have a minimum threshold for values that can be considered effectively zero, e.g., if( abs(value) < threshold) ignore.

@FlorentinD
Copy link
Collaborator

For more intuitive filtering based on the threshold idea by @ennerf , we could provide a 'filter(matrix, threshold)' method which delegates to select with a partly fixed lambda ('x-> |x| > threshold')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants