Simple adjoint method #41

smartalecH · 2023-03-22T03:27:47Z

Luckily, adjoint methods for (most) eigenvalue problems tend to be rather trivial if you are optimizing the eigenvalue directly. So, for example, if you want to match the effective index of two modes by optimizing the dimensions of the waveguide, you can use classical gradient-descent methods to do so.

Since there's only a few parameters (eg width, height), it's reasonable to just use a derivative-free method. However, in this case, it's still easy to compute the gradient using an adjoint method. Specifically, you can leverage the Hellmann-Feynman theorem, which only requires the mode fields themselves (ie there's no adjoint solve needed).

If you wanted to also optimize a quantity that depends on the eigenvectors (like matching the group index of two modes) then there's an extra perturbation term involving an additional linear solve (though since you are assembling an FEM matrix, you can probably perform this linear solve really easily). This sort of optimization is especially useful for modulator design, for example.

The hardest part of all of this is computing the $A_u$ term, which is the derivative of your system matrix wrt the DOF. But that can be automated with autodiff if done right.

But you can imagine some interesting dispersion engineering, especially when wanting to engineer RF and optical modes simultaneously. Especially if you can start to introduce some interesting DOF.

(cc @joelslaby, @joamatab)

simbilod · 2023-03-22T13:48:00Z

Agree this would be a great feature!

Here I can dump some references and notes from past projects:

As you say in principle autograd can take care of most of the input-output relationship, especialy since the scikit-fem FEM assembly is purely Python-based with simple operations. They have work on integrating autograd. They do it to find Jacobians for nonlinear problems, but could just as easily be used for optimization.

The issue is when we leave Python to do the main compute, for instance the linear solve (Coulomb, heat equation) or eigensolve (mode solving).

It would be good for this library to provide the most generic solution i.e. a primitive for the solver, leaving everything around to autograd, so that a user can define an arbitrary problem around it and just treat the linear or mode solve like a regular Python operation.

Linear solvers and dense eigensolvers have many primitives in popular autograd-enabled packages e.g PyTorch and JAX. Dense eigensolvers in particular have a simple expression using the entire spectrum: Mike B. Giles. Collected Matrix Derivative Results for Forward and Reverse Mode Algorithmic Differentiation. Springer-Link, pages 35{44, 2008}. So these just work.

For nonlinear problems where you apply these iteratively you might still prefer to write your own adjoint. Also for generic sparse eigensolvers this is also still required. It`s an open problem on JAX. For a specific problem we can derive our own adjoint and implement our own PyTorch or JAX primitive for embedding into a backpropagation pipeline. As you say for this looks like a linear solve reusing the results of the forward eigensolve, combined with the reverse-mode derivatives of its outputs w.r.t. to upstream computation.

I have a simple implementation of a JAX sparse eigensolver primitive for the finite-difference 1D Schrodinger Equation lying around. It handles both the eigenvalues and eigenvectors. I will see how difficult it is to port over to FEM (started #42 for the forward solve). Hopefully this can be readily extended to the Maxwell wave equation.

smartalecH · 2023-03-22T14:27:18Z

Thanks for chiming in!

The issue is when we leave Python to do the main compute, for instance the linear solve (Coulomb, heat equation) or eigensolve (mode solving).

You typically don't want to differentiate the solver routine itself. You end up differentiating the error of the solver too, which means you must needlessly track a large amount of useless data (often making problems intractable).

The trick is to implement the VJP (or pullback) ourselves by differentiating the operator (it can be a discrete operator) using AD and plugging that into the recombination step of the adjoint method.

Again, the main benefit here is we effectively get the gradient for free since it's just a function of the eigenfields.

(The group index/velocity, which is just the derivative of the eigenvalue, leverages this fact. This is one way to derive the integral often used to compute these quantities.)

smartalecH · 2023-03-22T14:45:40Z

For a specific problem we can derive our own adjoint and implement our own PyTorch or JAX primitive for embedding into a backpropagation pipeline.

I may be misunderstanding you. But you shouldn't be limited to a specific problem. Once you can AD the operator, you can automate any problem (like we do with meep).

simbilod · 2023-03-22T14:52:13Z

We're saying the same thing, although your vocabulary is probably more exact than mine :)

yes we don't want to backpropagate through an iterative solver, so we need to create the AD object with closed forms for the forward evaluation (matrix entry inputs --> evals and evectors) and backward evaluation (adjoints of evals and evectors --> adjoints of matrix entry inputs)

I'm wondering if we can instanciate this as the AD of a "sparse eigensolver" operator that would work for all sorts of wave problems, instead of an E&M, Schrodinger, or mechanical modes specific eigensolver operator

smartalecH · 2023-03-22T15:04:01Z

I'm wondering if we can instanciate this as the AD of a "sparse eigensolver" operator that would work for all sorts of wave problems, instead of an E&M, Schrodinger, or mechanical modes specific eigensolver operator

As long as each sparse (or dense) matrix has a routine that populates the matrix based on a set of input variables that are somehow related to your DOF, then all you need to do is differentiate that routine using AD (for whatever problem you care about). Then you just "recombine" the eigenvectors carefully.

Typically, this "brute force" approach has some sort of inefficiency. So I like to go through the math of each operator (not as hard as it sounds) and I usually find some interesting simplification that allows one to scale the problem much better.

yaugenst · 2023-03-23T14:48:06Z

If I may chime in for a second - I've been using femwell (very cool btw!) for a bit these past days and have been thinking about pretty much the same thing. I think that differentiating through the solver and the finite element assembly are both relatively straightforward technical problems, with the assembly part being a bit more tricky mostly because most of it is done in an external package (skfem). In the case of eigensolvers, there might be some issues depending on whether or not the eigenvectors are needed and depending on the properties of the system matrix, but for thermal/optical simulations there shouldn't be any showstoppers there.

The real issue I see lies in the meshing - in these problems here you are typically interested in shape derivatives (e.g., effective index w.r.t. the width of a waveguide). In other words, not the derivatives w.r.t. (just) the material parameters (as in topology optimization). This makes the problem quite tricky I think.

First, you need to be able to get the derivatives of the FE matrix w.r.t. the positions of the mesh nodes.
How easy this is to do depends on the meshing tool I guess, I am not too familiar with those so can't really say, but it's at least not a trivial problem as far as I'm aware.
There is the PhD thesis of a former colleague which goes a bit into the derivation of those shape derivatives in the introduction, but I am otherwise only superficially familiar with the problem.

Then there is the problem that moving nodes is only possible up to a certain point, at which you then have to remesh, which makes the problem discrete because you are changing the structure of the matrix. So a kind of hands-off "let autograd take care of everything" approach is not going to work.

If you are not going for very large changes in the geometry of your structure, one way to get around this could be to finely mesh in some region of interest and then finding some smart way of hopping around your nodes (and somehow interpolating in between?). If the expected changes are large, then this is not feasible as you would need to finely mesh everything, defeating the point of FEM.

There is also the more "hacky" way to do it by ignoring the fact that your mesh actually defines your geometry and parametrizing the material in a density-based topopt fashion such that you get "quasi-boundaries". But then these boundaries do not necessarily follow your mesh anymore, which is not great. It also means that you are not simulating "proper" geometric objects anymore (with neatly defined material boundaries), which then requires interpolation and it still requires fine meshing of the entire optimization region (or remeshing between iterations if you want to be smart about it...). I believe this is how a lot of topology optimization is done in FEM. Many of the typical examples in mechanics actually just use quadrilateral elements and work on what is essentially a regular grid.

In any case, I just wanted to mention this because I think it's something to consider.

smartalecH · 2023-03-23T17:53:24Z

I agree, @MRBaozi, dealing with the parameterization is tricky. But luckily there are some established solutions here.

If you are interested in shape optimization, as opposed to TO, then you need a level-set parameterization (as I'm sure you know). This is regardless of the underlying discretization scheme. Luckily this can be very straightforward. With implicit level sets, you optimize the level-set evolution, and you can remesh each iteration. Typically, however, you can't leverage "mathematical" optimization algorithms like CCSA/MMA, as you need to evolve the level set itself.

However, with explicit level sets, you can use traditional optimization algorithms (I think in the FEM world they call this an xFEM approach...). The FEM basis is a continuous interpolater, so there are many ways to do this. Moving node positions themselves however is quite tricky, and typically requires some SGD-like algorithms.

which makes the problem discrete because you are changing the structure of the matrix

This isn't necessarily true (but I see where you're coming from). It depends on how you parameterize things (see below).

So a kind of hands-off "let autograd take care of everything" approach is not going to work.

Even if you have to remesh, you can still use AD. You can still get continuous, well-defined gradients, for example, by using subpixel smoothing. In other words, you have a parameterization that abstracts away the underlying mesh (you just need a mapping from your level set to your new mesh each iteration). There are other ways to do this too (other than subpixel smoothing).

Now AD may be hard to implement. (This is where a Julia implementation would be really nice, as there are several end-to-end AD packages like enzyme.jl). But it's totally fine to approximate $A_u$ using finite differences, for example.

Ch 4 of my dissertation describes these nuances in a bit more detail.

A lot of these solutions require a different degree of work. My approach is to look at a particular problem first, and iteratively apply the right solution for that problem (rather than try to generalize a package up front with all conceivable use-cases).

So first step is find a useful problem (eg modulator RF/photonics co-design) and go from there.

yaugenst · 2023-03-24T07:23:56Z

Level sets are one way to parametrize the boundaries of arbitrary shapes, you only need them (in the sense that this is the established method) for generalized shape optimization, which is already pretty close to topology optimization anyways, and they have their own quirks of course (e.g., the level set is usually solved on a different mesh than the simulation).
My point was more that FEM already offers a quite natural basis for most shapes, and when you are interested in sizing operations such as waveguide width and height, it makes a lot of sense to exploit that basis, as you avoid many of the problems arising from methods where you have misaligned grids/meshes. In FEM, this means calculating the shape derivative, i.e. the derivative of $A_u$ w.r.t. all pairs of basis functions that support your boundary. In grid-based methods (like most finite difference stuff), your basis is essentially always ill-suited for your geometries (I'm exaggerating a bit obviously) and you have to find ways to get around that, such as with subpixel smoothing.

That being said, I'm sure we can agree that whatever parametrization one chooses, this is likely the area where most of the effort in the implentation lies here. As I said previously, I just wanted to raise this point because it was already on my mind and I believe that it is more involved than the "autodiff (eigen)solvers" part of the problem. And as you said, if the mesh assembly is fast, then doing finite differences for $A_u$ is probably a perfectly acceptable solution. Just for more complex geometries (and 3D), meshing usually takes a considerable amount of time.

And I absolutely agree that the most productive way forward in any case is tackling a (preferrably basic) problem and see where it goes from there. I will be playing around with this a bit myself (mostly out of curiosity) if I can find the time, but no promises 😅

jan-david-fischbach · 2023-06-21T00:16:02Z

Hey @smartalecH, @simbilod, @MRBaozi,

you guys seem quite versed with applying the adjoint method to all kinds of problems (I am more of a rookie...). I would like to implement gradients on FDE simulations (both the eigenvalues and vectors). Or if it's already out there just use it :)
To me it seems like the fundamental adjoint formulation should not depend on FEM vs. FDE, right?
To gain a bit more insight: @smartalecH, do you maybe have some literature suggestions / lecture slides etc. to read up on your initial statement. How does the Hellmann-Feynman theorem look like when applied to EM-modes? What form does the perturbation term take, you were referring to?
The rudimentary Ansatz I can think of right now is to apply chapter 2.2 of this paper:

Eigenvalue problem derivatives computation for a complex matrix using the
adjoint method

To the eigenvalue formulation behind the FDE, e.g. as in:

full-vectorial finite-difference analysis of microstructured optical fibers

But that seems to me like overcomplicating the issue and disregaring some usefull properties of EM eigenmodes!?
Sorry for such a blunt interruption of your discussion.
Regards JD

yaugenst · 2023-06-22T07:21:58Z

Hey @Jan-David-Black, I am not particularly familiar with FDE, but I am assuming that this means that you have some index profile on a regular grid, build a system matrix using that and then solve the eigenproblem for that matrix.

To me it seems like the fundamental adjoint formulation should not depend on FEM vs. FDE, right?

Exactly, fundamentally it is not even related to some particular physics. The main difference here is how you build up the system matrix, not how you solve the actual eigenproblem. I guess you have an eigenvalue problem somewhere in your code, and the operation of retrieving the eigenvalues and eigenvectors is what you want to differentiate through. Some useful references that come to mind are this extended collection of matrix derivative results (section 3.1, "Eigenvalues and eigenvectors") and Steven Johnson's notes on adjoint methods (section 4, "Eigenproblems"). Also check out Appendix 4 of this technical report.

The math tells you what terms you need to calculate, and then you look for efficient ways to implement that. In the simplest case however, you can essentially just replace your call to numpy.linalg.eig (assuming that this crops up somewhere in your code) by the corresponding jax or autograd routines and things should "just work".

Probably you are using sparse matrices though, which makes it a bit more tricky because sparse array support is extremely lacking in all autodiff frameworks that I know. You can check out e.g. legume from S. Fan's group at Stanford, they need a differentiable eigensolver for the guided mode expansion. In particular, you can check out how they implemented the autograd primitive here. I haven't checked whether something similar exists for jax, but I wouldn't be surprised.

Hope that helps!

jan-david-fischbach · 2023-06-22T14:47:47Z

Hey @Jan-David-Black, I am not particularly familiar with FDE, but I am assuming that this means that you have some index profile on a regular grid, build a system matrix using that and then solve the eigenproblem for that matrix.

Jep, that is how it works. It relies on a staggered Yee grid like FDFD and FDTD do, which lends itself to the maxwell curl equations.

Probably you are using sparse matrices though
Yep I am :) I'll check out the references you send, thanks a lot.

I have been skimming through legumes implementation. To understand what's going on I'll have to put in a lot more attention though... In their Readme it states that:

The backend switching between numpy and autograd follows the implementation in the fdfd package of Floris Laporte.

To my knowledge Floris only has a FDTD package. And the underlying math seems relatively different at first glance, as FDTD/FDFD is not an eigenvalue problem?! At least from Alecs thesis I am not able to build the bridge towards eigenmode calculation.

Thanks again for providing the references!!
Regards JD

HelgeGehring · 2023-07-23T09:35:49Z

Hey @smartalecH , @MRBaozi , @simbilod , @Jan-David-Black ,

I don't know yet enough about adjoint optimization, but from what I understand it would be really great to have the capabilities and I'm down for helping where I can :)

In #70 I've added a Julia/Gridap implementation of the mode solver and also a 3D-electrostatic and a 3D-thermal solver to simulate heat based phase shifters. Hope that helps :)
There's also now an example in the docs: https://helgegehring.github.io/femwell/julia/waveguide.html , more to come!

smartalecH · 2023-07-23T14:02:18Z

@HelgeGehring wow, nice work! (I'm a big Julia user, and can't wait to leverage its features in this context!)

I think the best place to start is to identify a practical problem that needs adjoint optimization. Then we add the functionality as needed. (As discussed earlier, a lot of this will be problem dependent)

Any ideas? I mentioned matching group indices of an optical and RF waveguide mode for a modulator, but I wonder if there is something more impactful and that's lower hanging fruit?

jan-david-fischbach · 2023-07-23T14:56:34Z

@smartalecH what about matching the modes of a spot size converter and a fiber? That would however require to solve the additional linear solve, as it depends on the eigenvector. There is only a few design parameters so probably not super helpful to use gradients here. And only optimizing for maximum performance isn't really valuable for anything that is supposed to be fabricated.

HelgeGehring · 2023-07-24T22:21:06Z

@smartalecH thanks! matching the mode group indices sounds great!
@Jan-David-Black sounds great, too!

I'd guess for starting it would maybe be nice to pick something as simple as possible? Just to get it once running?
After that we can then implement something that actually makes sense?

HelgeGehring added the enhancement New feature or request label Aug 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simple adjoint method #41

Simple adjoint method #41

smartalecH commented Mar 22, 2023 •

edited

Loading

simbilod commented Mar 22, 2023 •

edited

Loading

smartalecH commented Mar 22, 2023

smartalecH commented Mar 22, 2023

simbilod commented Mar 22, 2023 •

edited

Loading

smartalecH commented Mar 22, 2023 •

edited

Loading

yaugenst commented Mar 23, 2023 •

edited

Loading

smartalecH commented Mar 23, 2023

yaugenst commented Mar 24, 2023 •

edited

Loading

jan-david-fischbach commented Jun 21, 2023

yaugenst commented Jun 22, 2023

jan-david-fischbach commented Jun 22, 2023

HelgeGehring commented Jul 23, 2023

smartalecH commented Jul 23, 2023

jan-david-fischbach commented Jul 23, 2023

HelgeGehring commented Jul 24, 2023

Simple adjoint method #41

Simple adjoint method #41

Comments

smartalecH commented Mar 22, 2023 • edited Loading

simbilod commented Mar 22, 2023 • edited Loading

smartalecH commented Mar 22, 2023

smartalecH commented Mar 22, 2023

simbilod commented Mar 22, 2023 • edited Loading

smartalecH commented Mar 22, 2023 • edited Loading

yaugenst commented Mar 23, 2023 • edited Loading

smartalecH commented Mar 23, 2023

yaugenst commented Mar 24, 2023 • edited Loading

jan-david-fischbach commented Jun 21, 2023

yaugenst commented Jun 22, 2023

jan-david-fischbach commented Jun 22, 2023

HelgeGehring commented Jul 23, 2023

smartalecH commented Jul 23, 2023

jan-david-fischbach commented Jul 23, 2023

HelgeGehring commented Jul 24, 2023

smartalecH commented Mar 22, 2023 •

edited

Loading

simbilod commented Mar 22, 2023 •

edited

Loading

simbilod commented Mar 22, 2023 •

edited

Loading

smartalecH commented Mar 22, 2023 •

edited

Loading

yaugenst commented Mar 23, 2023 •

edited

Loading

yaugenst commented Mar 24, 2023 •

edited

Loading