Sparse Jacobians #227

kristianmeyerr · 2022-05-04T06:33:34Z

kristianmeyerr
May 4, 2022

Hi,

I have been using autodiff for a while now and I really like the software a lot. Great job.

I want to try and improve performance for sparse Jacobians. Consider the function, f: R^2 -> R^2 where
f_1 = x_1
f_2 = cos(x_2)

The Jacobian can be computed by autodiff as in example-forward-jacobian-derivatives-using-eigen. The function is defined as:
VectorXreal f(const VectorXreal& x){ VectorXreal out(2); out << x[0], cos(x[1]); return out; }
Then, the autodiff Jacobian function is called to give the solution:

MatrixXd J = jacobian(f, wrt(x), at(x), F);

However, note that the Jacobian is diagonal and it could be computed using one cycle in the ForEachWrtVar loop instead of two cycles since the two functions are independent. This of course becomes much more relevant for a larger matrices.

The idea is to identify what rows in the Jacobian can be computed together, for example, in this case, f_1 and f_2 both belongs to group 1. Hence, the function could be modified as:

f_1 = x_1
f_2 = cos(x_1)

To differentiate both functions using only one cycle instead of two. However, now the second function is evaluated at x1 instead of x2! So, I need some functionality to evaluate the gradient at different row-values.

Any ideas for how I could improve this would be very welcome :-)

Best regards
Kristian

allanleal · 2022-05-05T07:36:38Z

allanleal
May 5, 2022
Maintainer

I'm not sure ATM how a general solution yet efficient can be designed for this (but I believe it is possible). This would highly depend on the sparsity structure of the problem. For example, for block diagonal matrices (which your example fits in this case), with say, 4x4 block size, you could group the functions in sets of 4 and then compute the Jacobian of these 4 functions. But for more complicated and unstructured sparsity, the strategy would not be as trivial (but still possible, although you may have to know in advance the "dependence network" among functions and variables).

Another solution to your example would be:

Define g1(x, y) = (x, 0) and g2(x, y) = (0, cos(y)). Note that 0 is a trivial computation. You can now compute the derivatives of g1 with respect to x (the first column in your sparse Jacobian matrix) and then the derivatives of g2 with respect to y (the second column in your sparse Jacobian matrix).

Can you see that you are not evaluating cos in the first column (and thus avoiding expensive evaluations)?

1 reply

kristianmeyerr May 5, 2022
Author

Yes, I also believe it is possible. I do have a scheme working for arbitrary sparsity patterns, but using FD instead of AD! The speed up is crazy for very large and unsymmetrical sparse matrices. I used the method from A. Curtis, M. J. D. Powell, and J. Reid, “On the estimation of sparse Jacobian matrices” to define the dependency network.

Yes, I see what you did there. This is exactly what I want to be able to do, but without re-writing the function into two separate functions (that would be very hard to generalise).

From the FD perspective it is somehow easier for me to understand. Consider the same function: f: R^2 -> R^2 with input x=1, 2 and output y=1, cos(2). Then I can compute the Jacobian entries using 1 function evaluation by evaluating:

VectorXreal x(2);
x << 2, 1;

VectorXreal y(2);
y = f(x);

VectorXreal fdStep(2);
fdStep << 1e-06, 1e-06;

VectorXreal xDel(2);
xDel = (x + fdStep) - x;

VectorXreal yDel(2);
yDel = f(x + xDel);

VectorXreal dfdy(2);
dfdy = (yDel - y).array() / xDel.array();

std::cout << dfdy << std::endl;

and then assembling the sparse Jacobian based on its pattern.

Note that this is better than the naive approach which would use 2 function evaluations. If the matrix grows to size n=10000, I still only use 1 function evaluation instead of 10000 (imagine this for a heavy function evaluation!)

I need to think more about how I can achieve it with AD without re-writing the function into two different ones. I tried to do seeding within the function, to change x2 to x1, such that I get x1 and cos(x1) differentiated at the same time using only x[0] for seed. But of course its wrong, since then I can only evaluate the derivative for cos at x1 instead of at x2!

kristianmeyerr · 2022-05-05T11:56:20Z

kristianmeyerr
May 5, 2022
Author

One way to do it would be to add the functions together, since they are linearly independent by definition.

So for example, one could modify the function as:

real f(const VectorXreal& x)
{
    return x[0] + cos(x[1]);
}

Then take the gradient w.r.t. x=[2, 1] and then insert the values back into the Jacobian based on its pattern. I will see if I can develop a solution based on this.

1 reply

kristianmeyerr May 7, 2022
Author

hmm, no, of course this does not work, it still has to go through all the entries in x, making it even slower than the "naive approach"

kristianmeyerr · 2022-05-07T11:34:14Z

kristianmeyerr
May 7, 2022
Author

I can only get this to work if taking the approach in the first post, i.e. changing all the independent functions such that they depend on x[0] only and then taking the derivative w.r.t. x[0]. For a 10000 x 10000 Matrix I can compute the Jacobian in 4 ms, whereas the computation takes 34,8 seconds for the typical naive approach going through all the derivative w.r.t to all x. The only problem is that then it only works if all the values are the same, i.e. x[0]=x[1]=x[2].

I can get it to work if splitting up the functions also as suggested by @allanleal, but I don't see a good approach for making this into a generalised method that works for arbitrary size functions.

Not sure how to continue at this moment.

3 replies

allanleal May 7, 2022
Maintainer

Combine my approach of creating vector functions g1(x, y), g2(x, y), ... (using lambda functions) in which the construction of all these g functions require the sparsity pattern (e.g., a matrix with true/false values, or a vector<vector<int>> with the indices of the variables for each column that are non-zero).

Compute the Jacobian of the overall problem by computing the derivatives of g1 wrt to all variables that make sense for it. Then the same for g2, and so forth.

This is just a rough idea. Writing this very quickly here.

kristianmeyerr May 9, 2022
Author

Ok, I tried this with below code. Now I just defined a function where it is easy to generate lambas for all the functions, f1=sin(x1), f2=sin(x2), ... I'm not sure if it is easy for a user to generate the functions in this format. I tried to generate the lambas based on f, but then it fails to speedup again since it has to go through all the functions in f. In below code I get speed-up of 10x for a 10000x10000 diagonal Jacobian.

#include <iostream>
#include <chrono>
#include <autodiff/forward/real.hpp>
#include <autodiff/forward/real/eigen.hpp>

using namespace autodiff;
using namespace std::chrono;

VectorXreal f(const VectorXreal& x)
{
    Eigen::Index n = x.size();
    VectorXreal rhs(n);
    for(int i = 0; i < n; ++i) {
        rhs(i) = sin(x[i]);
    }
    return rhs;
}

int main()
{
    using Eigen::MatrixXd;

    Eigen::Index n = 10000;

    VectorXreal x(n);
    // x.setOnes();
    x.setLinSpaced(n, 1, static_cast<double>(n));

    constexpr auto g = [&](const VectorXreal& x, const int index){
        VectorXreal out = VectorXreal::Zero(x.size());
        out[index] = sin(x(index));
        return out;
    };

    auto start1 = high_resolution_clock::now();
    VectorXreal F1;
    MatrixXd J1(n, 1);
    for(int i = 0; i < n; ++i) {
        MatrixXd JTemp = jacobian(g, wrt(x[i]), at(x, i), F1);
        J1(i, 0) = JTemp(i, 0);
    }
    auto stop1 = high_resolution_clock::now();
    auto cpu_time_fast = duration_cast<microseconds>(stop1 - start1);

    MatrixXd J(n, n);
    for(int i = 0; i < n; ++i) {
        J(i, i) = J1(i, 0);
    }

    // Get Jacobian using one call instead of three
    VectorXreal FNaive;
    auto start2 = high_resolution_clock::now();
    MatrixXd JNaive = jacobian(f, wrt(x), at(x), FNaive); // evaluate the output vector F and the Jacobian matrix dF/dx
    auto stop2 = high_resolution_clock::now();
    auto cpu_time_naive = duration_cast<microseconds>(stop2 - start2);

    // Analytical Jacobian
    MatrixXd JAna(n, n);
    for(int i = 0; i < n; ++i) {
        JAna(i, i) = cos(static_cast<double>(x[i]));
    }

    std::cout << JAna.isApprox(J) << std::endl;

    std::cout << "Fast: " <<  cpu_time_fast.count()/1000 << " miliseconds" << std::endl;
    std::cout << "Naive " << cpu_time_naive.count()/1000 << " miliseconds" << std::endl;

}

kristianmeyerr May 9, 2022
Author

Basically I was expecting a 1000x speedup, so I'm not sure that this approach is feasible. I'm guessing it still has to go through a lot of zeros, although it is a trivial computation, it could take time since there are so many of them!

When you do the seeding internally to get the derivatives df1/dx1(x1), df2/dx1(x1) for the first column in the Jacobian, would it be possible to change the evaluation such that we can get the derivates at different values? i.e. df1/dx1(x1), df2/dx1(x2), ...

LiuZhexuan · 2024-07-12T02:38:12Z

LiuZhexuan
Jul 12, 2024

I found an article about this: What Color Is Your Jacobian? Graph Coloring for Computing Derivatives. It would be great if this can be implemented.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sparse Jacobians #227

{{title}}

Replies: 4 comments 5 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

Sparse Jacobians #227

kristianmeyerr May 4, 2022

Replies: 4 comments · 5 replies

allanleal May 5, 2022 Maintainer

kristianmeyerr May 5, 2022 Author

kristianmeyerr May 5, 2022 Author

kristianmeyerr May 7, 2022 Author

kristianmeyerr May 7, 2022 Author

allanleal May 7, 2022 Maintainer

kristianmeyerr May 9, 2022 Author

kristianmeyerr May 9, 2022 Author

LiuZhexuan Jul 12, 2024

kristianmeyerr
May 4, 2022

Replies: 4 comments 5 replies

allanleal
May 5, 2022
Maintainer

kristianmeyerr May 5, 2022
Author

kristianmeyerr
May 5, 2022
Author

kristianmeyerr May 7, 2022
Author

kristianmeyerr
May 7, 2022
Author

allanleal May 7, 2022
Maintainer

kristianmeyerr May 9, 2022
Author

kristianmeyerr May 9, 2022
Author

LiuZhexuan
Jul 12, 2024