How to compute a partial jacobian matrix. #5904

neale · 2021-03-02T18:23:44Z

neale
Mar 2, 2021

Hi there!

I'm hoping JAX can do what other frameworks cannot. I have a need to compute just a subset of the input-output Jacobian (J) of some function f. Since f is large (many parameters), I would like to consider J as a block matrix:

J_f = [[A, B], 
       [C, D]],

I want to only extract the "A" block of the jacobian. Is this possible with a custom VJP rule?

In this case, if I use a JVP then I can get "A" by masking out the other blocks, but I still need to represent V as a large matrix. In addition, for JVP/VJP solutions the output is still the full size of J and this could be difficult to store in memory.

Thanks for any help!

Answered by jakevdp

Mar 2, 2021

Have you tried using standard slicing to return the portion of the Jacobian matrix you're interested in? Under JIT, the intermediate objects are not actually computed, and the XLA compiler is able to trim parts of the computation that are unnecessary. For example:

import jax
import jax.numpy as jnp
import numpy as np

np.random.seed(1701)
N = 1000
f_mat = np.array(np.random.rand(N, N))
def f(x):
  return jnp.sqrt(f_mat @ x / N)
x = np.array(np.random.rand(N))

#-----------
# Full Jacobian
f1 = jax.jit(lambda x: jax.jacfwd(f)(x))
J_f1 = f1(x)

print(J_f1.shape)
# (1000, 1000)

%timeit f1(x)
# 100 loops, best of 5: 6.96 ms per loop

#----------
# Partial Jacobian
f2 = jax.jit(lambda x: jax.j…

View full answer

jakevdp · 2021-03-02T19:41:58Z

jakevdp
Mar 2, 2021
Maintainer

Have you tried using standard slicing to return the portion of the Jacobian matrix you're interested in? Under JIT, the intermediate objects are not actually computed, and the XLA compiler is able to trim parts of the computation that are unnecessary. For example:

import jax
import jax.numpy as jnp
import numpy as np

np.random.seed(1701)
N = 1000
f_mat = np.array(np.random.rand(N, N))
def f(x):
  return jnp.sqrt(f_mat @ x / N)
x = np.array(np.random.rand(N))

#-----------
# Full Jacobian
f1 = jax.jit(lambda x: jax.jacfwd(f)(x))
J_f1 = f1(x)

print(J_f1.shape)
# (1000, 1000)

%timeit f1(x)
# 100 loops, best of 5: 6.96 ms per loop

#----------
# Partial Jacobian
f2 = jax.jit(lambda x: jax.jacfwd(f)(x)[:5, :5])
J_f2 = f2(x)

print(J_f2.shape)
# (5, 5)

%timeit f2(x)
# 1000 loops, best of 5: 214 µs per loop

3 replies

neale Mar 2, 2021
Author

Oh excellent. I didn't know that XLA would trim the actual computation. Thanks so much!

tetterl Mar 4, 2021

Thanks @jakevdp for explaining this!
Since jacfwd is implemented as a Jacobian-Vector-Product I'd have expected that only slicing the columns could give a speed-up since this is equivalent to evaluating less Jacobian-Vector-Products. To my positive surprise also slicing the rows gives a speed-up. Does jit transform it to a jacrev internally to achieve the speed-up or how is this achieved? This would explain why we don't get any further speed-up when slicing both axes (except if we area already down to the dispatch time). Additionally, this would explain why we don't get any significant speed-up when computing Hessians.

Understanding this would help me a great deal since I was wondering how one could get slices/blocks of J^T J (Gauss-Newton) matrices (e.g. sparsity or pre-conditioning reasons) efficiently. Combining a JVP and VJP wouldn't allow slicing if the above assumptions are correct? Would linearize help in this matter? (btw looking forward to the "Advanced Autodiff Cookbook" explaining this)

import jax
import jax.numpy as jnp
import numpy as np

np.random.seed(1701)
N = 1000
f_mat = np.array(np.random.rand(N, N))
def f(x):
  return jnp.sqrt(f_mat @ x / N)
x = np.array(np.random.rand(N))

#-----------
# Full Jacobian
f1 = jax.jit(lambda x: jax.jacfwd(f)(x))
J_f1 = f1(x)

print(J_f1.shape)
# (1000, 1000)

%timeit f1(x)
# 100 loops, best of 5: 10.5 ms per loop

#----------
# Partial Jacobian
f2 = jax.jit(lambda x: jax.jacfwd(f)(x)[:5, :5])
J_f2 = f2(x)

print(J_f2.shape)
# (5, 5)

%timeit f2(x)
# 1000 loops, best of 5: 227 µs per loop


#----------
# Partial Jacobian
f3 = jax.jit(lambda x: jax.jacfwd(f)(x)[:, :5])
J_f3 = f3(x)

print(J_f3.shape)
# (1000, 5)

%timeit f3(x)
# 1000 loops, best of 5: 240 µs per loop

#----------
# Partial Jacobian
f4 = jax.jit(lambda x: jax.jacfwd(f)(x)[:5, :])
J_f4 = f4(x)

print(J_f4.shape)
# (5, 1000)

%timeit f4(x)
# 1000 loops, best of 5: 223 µs per loop

jakevdp Mar 4, 2021
Maintainer

Does jit transform it to a jacrev internally to achieve the speed-up or how is this achieved?

The jit compilation is much lower-level than that: it knows nothing about gradients, but rather optimizes sequences of array operations. You can see the intermediate representation of the sequence of operations created by a transform like grad by using jax.make_jaxpr:

import jax
import jax.numpy as jnp
import numpy as np

np.random.seed(1701)
N = 1000
f_mat = np.array(np.random.rand(N, N))
def f(x):
  return jnp.sqrt(f_mat @ x / N)
x = np.array(np.random.rand(N))
jax.make_jaxpr(lambda x: jax.jacfwd(f)(x)[:5, :5])(x)
# { lambda a b c ; d.
#   let e = dot_general[ dimension_numbers=(((1,), (0,)), ((), ()))
#                        precision=None
#                        preferred_element_type=None ] a d
#       f = dot_general[ dimension_numbers=(((1,), (1,)), ((), ()))
#                        precision=None
#                        preferred_element_type=None ] a b
#       g = div e 1000.0
#       h = div f 1000.0
#       i = sqrt g
#       j = div 0.5 i
#       k = transpose[ permutation=(1, 0) ] h
#       l = broadcast_in_dim[ broadcast_dimensions=(1,)
#                             shape=(1000, 1000) ] j
#       m = mul k l
#       n = transpose[ permutation=(1, 0) ] m
#       o = slice[ limit_indices=(1000, 1000)
#                  start_indices=(0, 0)
#                  strides=None ] n
#       p = broadcast_in_dim[ broadcast_dimensions=(  )
#                             shape=(1,) ] 0
#       q = concatenate[ dimension=0 ] c p
#       r = broadcast_in_dim[ broadcast_dimensions=(  )
#                             shape=(1,) ] 0
#       s = concatenate[ dimension=0 ] q r
#       t = gather[ dimension_numbers=GatherDimensionNumbers(offset_dims=(0, 1), collapsed_slice_dims=(), start_index_map=(0, 1))
#                   slice_sizes=(5, 5) ] o s
#       u = broadcast_in_dim[ broadcast_dimensions=(0, 1)
#                             shape=(5, 5) ] t
#   in (u,) }

This sequence of operations is what XLA optimizes, primarily by reordering, fusing, or eliding parts of the sequence in order to make them more efficient.

I'm not sure specifically what allows XLA to compute partial jacobian matrices more quickly than full matrices; if you're curious exactly what XLA is doing, there are ways to output the HLO (high-level optimized) code representation: see jax.tools.jax_to_hlo.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to compute a partial jacobian matrix. #5904

{{title}}

Replies: 1 comment 3 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

How to compute a partial jacobian matrix. #5904

neale Mar 2, 2021

Replies: 1 comment · 3 replies

jakevdp Mar 2, 2021 Maintainer

neale Mar 2, 2021 Author

tetterl Mar 4, 2021

jakevdp Mar 4, 2021 Maintainer

neale
Mar 2, 2021

Replies: 1 comment 3 replies

jakevdp
Mar 2, 2021
Maintainer

neale Mar 2, 2021
Author

jakevdp Mar 4, 2021
Maintainer